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For nearly a century, investigators in the social sciences have used regression 
models to deduce cause-and-effect relationships from patterns of association. Path 
models and automated search procedures are more recent developments. In my 
view, this enterprise has not been successful. The models tend to neglect the 
difficulties in establishing causal relations, and the mathematical complexities tend 
to obscure rather than clarify the assumptions on which the analysis is based. 
Formal statistical inference is, by its nature, conditional. If maintained hypotheses 
A, B, C,... hold, then H can be tested against the data. However, if A, B, C,... 
remain in doubt, so must inferences about H. Careful scrutiny of maintained 
hypotheses should therefore be a critical part of empirical work—a principle 
honored more often in the breach than the observance. This paper focuses on 
modeling techniques that seem to convert association into causation. The object is 
to clarify the differences among the various uses of regression, as well as the source 
of the difficulty in making causal inferences by modeling. The discussion will 
proceed mainly by examples, ranging from Yule (X R. Stat. Sac. 62 (1899), 
249-295) to Spirtes, Glymour, and Schemes (“Causation," Lect. Notes in Statist, 
Vol. 81, Springer-Verlag, New York/Beclin, 1993). © Acidemiii.Pi-css 


1. OUTLINE 

Many treatments of regression seem to take for granted that the 
investigator knows the relevant variables, their causal order, and the 
functional form of the relationships among them; measurements of 
the independent variables are assumed to be without error. Indeed, Gauss 
developed and used regression in physical science contexts where these 
conditions hold, at least to a very good approximation.' Today, the text¬ 
book theorems that justify regression are proved on the basis of such 
assumptions. 

* Presented at the Notre Dame Conference on Causality in Crisis, Oct. 15-17, 1993. 

' Gauss was fitting orbits to astronomical observations, with least squares to estimate the 
elements of the orbits [21]. Stigler [64, pp- 145-146] awards priority to Legendre [36]. 
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In the social sciences, the situation seems quite different. Regression is 
used to discover relationships or to disentangle cause and effect. However, 
investigators have only vague ideas as to the relevant variables and their 
causal order; functional forms are chosen on the basis of convenience or 
familiarity; serious problems of measurement are often encountered. 

Regression may offer useful ways of summarizing the data and making 
predictions. Investigators may be able to use summaries and predictions to 
draw substantive conclusions. However, I see no cases in which regression 
equations, let alone the more complex methods, have succeeded as engines 
for discovering causal relationships. Of course, there may be success 
stories that I have not found; nor does a track record of failure necessarily 
project into the future. 

One of the first applications of regression techniques to social science is 
Yule [71], Recent examples will be found in Spirtes, Glymour, and Scheines 
[62], to be cited here as SGS. (The SGS theory is summarized in Glymour 
[23], cited as CG.) SGS have attracted considerable attention in the 
philosophy of science, because they have developed computerized algo¬ 
rithms that search for path models. With their algorithms, SGS claim to 
make rigorous inferences of causation from association. This is a bold 
claim, which does not survive examination. 

The balance of this paper is organized as follows. Section 2 discusses 
Yule’s work. Sections 3 and 4 explain the critical data of “exogeneity.” 
Section 5 describes a contemporary regression model. Sections 6-10 re¬ 
view SGS and reanalyze some of their examples. Sections 11-12 canvass 
some mathematical issues. Possible responses to my critique will be found 
in Section 13. There is a brief review of the literature in Section 14, and 
conclusions are presented in Section 15. For ease of reference, standard 
formulas for regression are given in an appendix. I have tried to make 
most of the paper accessible to nonstatistical readers, particularly if they 
will permit the occasional undefined technical term; Sections 11 and 12 are 
more specialized. 


2. YULE’S REGRESSION MODEL FOR PAUPERISM 

One of the first regression models in social science was developed by 
Yule—“An Investigation into the Causes of Changes in Pauperism in 
England, Chiefly During the Last Two Intercensal Decades.”’ In late 19th 
century England, poor people could be supported either inside the poor 
house or outside. Did provision of support outside the poor house increase 
the number of poor people? 

^ See [71; 64, pp. 34.S-3S8; 11]. 



To addres.s this issue. Yule used d 
and 1891. (In England, the census is 
considered the periods 1871—1881 an 
number of paupers to changes in th 
between the number of paupers su[ 
inside. He used regression to control 
population and its age structure. 

His equation can be written as foil 

APaiip = a + b y. ^Out -1- c 

Here, A stands for percentage diffe 
pers. Out for the outrelief ratio, Po/ 
proportion of people aged 65 and ov 

Yule’s unit of analysis was the “i 
small geographical area like a count; 
mixed, urban, metropolitan. He usee 
estimate the coefficients from the d: 
to do the arithmetic. 

To be more specific, Yule cstin 
time period (1871-81 and 1881-91) 
time periods and 4 kinds of areas, 
time period, all areas of the same kin 
governed by one equation, (By coinci 
equation, and 4 kinds of areas.) 

Yule was looking for the “Hook 
experiment, with lots of variation > 
analyzed the results. Regression was 
ing effects of change in population a 
held to show that, other things bcint 
cretite corresponding changes in tlu 
increase the outrelief ratio by one 
factors constant, you will increase lli 
being the coefficient of A (hit in 
positive, welfare creates paupers. 

Fi>r a moment. 1 turn from Yule h 
like (1) is usually written as 

y = .\ 

In this equtuion, the vector Y rep 
pauperism; the matrix A' represent 

’ Tlicre WL-rc about null such areas in l2ni;la 
parishes comhinctl for administrative purpose 


PM3006509534 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 






•-h- R-FFTT KrAF 

r’-OiTTeems quite different. Regression is 
Ojisentangle cause and effect. However, 
jorsrfo^tJre relevant variables and their 
^yiiosen x)n the basis of convenience or 
rrsnrement are often encountered. 

summarizing the data and making 
heJoaise summaries and predictions to 
I see no cases in which regression 
2if‘.5Ciiiethods, have succeeded as engines 
Of course, there may be success 
"ies a track record of failure necessarily 

yrrtastoirt^hniqucs to social science is 
lidird in Spirtes, Giymour, and Scheines 
-jtJS theory is summarized in Giymour 
.■iacjsd. .considerable attention in the 
wf have developed computerized algo- 
-u With their algorithms, SGS claim to 
jipm from association. This is a bold 
■m-ahanr' ’ ' 

■;;jtnTzed as follows. Section 2 discusses 
il3.m_.the critical data of “exogeneity.” 
TTegressIon^niodcL Sections 6-10 re- 
.■jW*'ex8[mples. Sections 11-12 canvass 
'»p'onses to my critique will be found 
xv of the literature in Section 14, and 
fl!LJ,5,_BDr ease of reference, standard 
mt an appendix. I have tried to make 
.rstalistical readers, particularly if they 
Ltechnical-term; Sections 11 and 12 are 


■vODIfEL FOR PAUPERISM 

jS-in social science was developed by 
.auses of Changes in Pauperism in 
iFtntercensal Decades.”^ In late 19th 
:76er supported cither inside the poor 
Apijort outside the poor house increase 


I 

I 



I 

I 

I 

I 

I 

1 

1 

1 

1 

J 

1 

■1 

_1 


■T1 

1 


REGRESSION 61 

To address this issue, Yule used data from the censuses of 1871, 1881, 
and 1891. (In England, the census is taken in years that end with 1.) He 
considered the periods 1871-1881 and 1881-1891, relating changes in the 
number of paupers to changes in the “outrelief ratio,” that is, the ratio 
between the number of paupers supported outside the poor house and 
inside. He used regression to control for two confounders—changes in the 
population and its age structure. 

His equation can be written as follows: 

APaup = a + b X AOut + c X APop + d X AOld + error. (1) 

Here, A stands for percentage difference, Paup for the number of pau¬ 
pers, Out for the outrelief ratio, Pop for population size, and Old for the 
proportion of people aged 65 and over. 

Yule’s unit of analysis was the “union,” which seems to have been a 
small geographical area like a county.^ He had four kinds of areas: rural, 
mixed, urban, metropolitan. He used “Ordinary Least Squares” (OLS) to 
estimate the coefficients from the data, with a “50 cm. Gravet” slide rule 
to do the arithmetic. 

To be more specific. Yule estimated a separate equation for each 
time period (1871-81 and 1881-91) and each kind of area. There were 2 
time periods and 4 kinds of areas, thus, 2X4 = 8 equations. Within a 
time period, ail areas of the same kind—for instance, all rural unions—are 
governed by one equation. (By coincidence, there are 4 coefficients in each 
equation, and 4 kinds of areas.) 

Yuie was looking for the “Hooke’s Law of Poverty.” Nature ran an 
experiment, with lots of variation over time and geography, and Yule 
analyzed the results. Regression was needed to control for the confound¬ 
ing effects of change in population and age structure. The equations were 
held to show that, other things being equal, changes in the outrelief ratio 
create corresponding changes in the number of paupers. Indeed, if you 
increase the outrelief ratio by one percentage point but hold the other 
factors constant, you will increase the number of paupers by b percent, b 
being the coefficient of A Out in Eq. (1). More qualitatively, if b is 
positive, welfare creates paupers. 

For a moment, I turn from Yule to methodology. A regression equation 
like (1) is usually written as 

Y=X^+s. (2) 

In this equation, the vector Y represents the dependent variable, like 
pauperism; the .matrix X represents-the e.xplanatory-j(or—independent”) 

^ There were about 600 such areas in England. A poor-law union ‘^consisted of two or more 
parishes combined for administrative purposes/’ [64, p. 346]. 
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variables, like the outrelief ratio, population, and age structure. These are 
observable. The vector jS represents parameters, which are not observable 
but may be estimated from the data; parameters are “social constants,” 
which characterize the process that generated the data. In Yule’s equation, 
/3 has four components—the parameters a, b,c,d in Eq. (1). The error or 
“disturbance” term s is also unobservable and represents the impact of 
chance factors unrelated to X. Statistical inferences are often based on 
“stochastic assumptions” about s; e.g., s is independent of X and its 
components are independent and identically distributed with mean 0. For 
details, see the Appendix. 

Three possible uses for regression equations are 

(i) to summarhe data, or 

(ii) to predict values of the dependent variable, or 

(iii) to predict the results of interventions. 

Yule could Certainly have summarized his data by saying that for a given 
time period and unions of a specific type, with certain values of the 
explanatory variables, the change in pauperism was about so much and so 
much. In other words, he could have used his equations to estimate the 
average value of Y, given the values of X. This use of regression may run 
into technical problems if there are outliers, or nonlinearities in the 
regression surface. However, at least in principle, there do seem to be 
technical fixes for such problems. Furthermore, stochastic assumptions 
about the disturbance term play almost no role. Therefore, like most 
statisticians, I believe that regression can be quite helpful in summarizing 
large data sets. 

For prediction, there is a ceteris paribus assumption: the system will 
remain stable. Prediction is already more complicated than description. On 
the other hand, if you make a series of predictions and test them against 
data, it may be possible to show that the system is stable, or sufficiently 
stable for regression to be quite helpfuL* Again, any particular use of 
regression to make predictions may go off the rails, but there do not seem 
to be essential difficulties of principle involved. 

Causal inference is different, because a change in the system is contem¬ 
plated; for example, there will be an intervention. Descriptive statistics tell 
you about the correlations that happen to hold in the data; causal models 
claim to tell you what will happen to Y if you change X. Indeed, 
regression is often used to make counterfactual inferences about the past: 
what would Y have been if X had been different? This use of regression 


Meehl [41] provides some well-known examples. Predictive validity is best demonstrated 
by making real ex ante forecasts in several different contexts: see Ehrenberg and Bound I13J. 
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to make causal inferences is the most intriguing—and the most problem¬ 
atic. Difficulties are created by omitted variables, incorrect functional 
form, etc. Of course, if the results of causal modeling were with any 
frequency checked against the results of interventions, the balance of 
argument might be very different.^ 

For description and prediction, the numerical values of the individual 
coefficients fade into the background; it is the whole linear combination 
on the right-hand side of the equation that matters. For causal inference, 
it is the individual coefficients that do the trick. In Eq. (1), for example, it 
is b that should tell you what happens to pauperism when the outrelief 
ratio is manipulated. 

At this remove, the flaws in Yule’s argument may be apparent. For 
example, there seem to be some important variables missing from the 
equation, including variables that measure economic activity. Here is 
Yule’s comment on the last-named factor [71, p. 253]: 

A good deal of lime and labour was spent in making trial of this idea, but the 
results proved unsatisfactory, and finally the measure was abandoned alto¬ 
gether. 

Yule [71] seems to have used the rate of population growth—A Pop in Eq. 
(1)—as a proxy for economic activity, although that creates ambiguity. 
Other things being equal, population growth will by itself add to the 
number of paupers; in its role as projqf, however, population growth should 
reduce pauperism. 

The equations for metropolitan unions are shown below, for 1871-1881 
and 1881-1891:^ 

(1871-1881) 

^Pallp = 13.19 + 0.755 X AOiit - 0.322 X APop 
— 0.022 X AOld + residual. 

(1881-1891) 

APfliip = 1.36 -1- 0.324 X ^Out - 0.369 X APop 
H- 1.37 X AOW + residual. 

For example, one metropolitan union is Westminster. Over the period 
1871-1881, the percentage changes in Out, Pop, and Old are —73, —9, 

^ Also see Manski [40], 

* These, and the other six equations, are reported in Yule [71, Table C, p. 259], His Table 
XIX gives data for metropolitan unions, in the form of “percentage ratios" for 1871-1881 
rather than differences, apparently to avoid negative numbers. The equations were fitted to 
data; the numerical coefficients in the displays are estimates for the corresponding parame¬ 
ters in (1); the residuals are observable, but arc only approximations to unobservable 
disturbance terms. 
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64 DAVID FREEDMAN 

and 5, respectively. The percentage change in Paup predicted from the 
regression equation is 

13.19 + 0.755 X (-73) - 0.322 X ( -9) - 0.022 X 5 = -39. 

The actual percentage change in Paup is — 48. The “residual” is 

residual = actual — predicted = — 48 — (“39) = —9. 

The coefficients in the regression equation are estimated so as to minimize 
the size of the residuals. (Technically, it is the sum of the squares that is 
minimized—hence the term “least squares.”) The linear combination of 
explanatory variables on the right side of the equation has therefore been 
optimized; but there is no guarantee that individual coefficients will make 
much sense. 

There are some noticeable inconsistencies in Yule’s coefficients, over 
time and across the various kinds of geography. Nor are the signs of the 
coefficients entirely reasonable. These inconsistencies may not by them¬ 
selves be fatal, but they certainly raise the question of whether the 
equations hold true for any well-defined population of times and places. If 
the coefficients do not have a life of their own—outside Yule’s particular 
data set—they cannot be used to answer questions of the form, “What 
would happen if you change the outrelief ratio?” The coefficients may be 
useful for descriptive purposes, but not for causal inference or even 
prediction. 

Moreover, there are familiar difficulties of interpretation. At best, Yule 
showed that changes in pauperism and the outrelief ratio were associated, 
even after adjusting for changes in the population and its age structure. 
The direction of the causal arrow, however, is by no means clear. Yule’s 
theory is that outrelief is the cause and pauperism is the effect. That is a 
reasonable view. However, the opposite idea seems equally tenable—a 
union that is flooded with paupers may not be able to build poor houses 
fast enough and resorts to outrelief. If so, pauperism causes outrelief. 
Also, Governor Pete Wilson’s theory may have some plausibility for 19 th 
century England if not 20th century California: unions that provide gener¬ 
ous Outrelief attract paupers from elsewhere.^ 

Yule must have been aware of these problems. After allocating the 
changes in pauperism to their various causes (including the residual), he 

’ According to Stigler [64, pp. 356-357}, Pigou criticized Yule for ignoring “the non- 
quantitative facts Of the situation.... It is well known that, during recent years, those unions 
in which out-relief has been restricted have, on the whole, enjoyed a general administration 
much superior to that of other unions." Stigler responds that “Pigou's ad hoc 
speculation... could not, of course, be disproved from the data Yule used.” In effect, this 
allows Yule to defend himself by pleading ignorance. 
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Fig. 1. Yule’s model for pauperism. The figure represents Eq. (1) in graphical form. The 
asterisks denote a high degree of statistical significance. To determine the asterisks, I 
recomputed Yule’s regression for the metropolitan unions over the period 1871-ISSl, using 
data in his Table XIX. I replicated his coefficients, as shown in the display, although roundoff 
error is quite large: 

APaup =12.884 + 0.752 X AOut - 0.311 X APop + 0.056 X AOld + residuat, 

10.367 0.135 0.067 0.223 

:.24 5.57 - 4.645 0.25 

Under the coefficients are standard errors (SBs) and t-statistics. The SE indicates the likely 
size of the difference between an estimated coefficient and its true value. The r-staristic is the 
ratio of an estimate to its SE. Generally, a t-statistic above 2 or 3 in absolute value indicates 
that the corresponding parameter is unlikely to be truly 0. The parameters are features of the 
model, and the SEs are computed on the basis of the stochastic assumptions in the niodei; for 
details, see the appendix. In Fig. 1, the explanatory variables are correlated; such correlations 
are often signaled by curved, double-headed arrows; error terms are not shown either. 


withdraws all causal claims with one deft sentence; “Strictly, for ‘due to’ 
read ‘associated with.’” [71, p. 270, footnote 25]. Yule’s paper is quite 
modern in spirit, with two exceptions: he did not rely on statistical 
significance, and he did not use a graph. Figure 1 brings him up to date. 


3. REGRESSION ESTIMATES AND 
CONDITIONAL EXPECTATIONS 

In the regression model (2), Y is the dependent variable, like pauperism; 
X represents the explanatory variables, like the outrelief ratio, population, 
and age structure. If all goes well, the regression equation will estimate the 
“conditional expectation” of Y given X = x, that is, the average value of Y 
corresponding to given values for the explanatory variables. 

To clarify the definitions, consider two procedures; 

Procedure 1. Select subjects with X = x\ look at the average of their 

Y’s. 
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Procedure 2. Intervene and set = a: for some subjects; look at the 
average of their F’s. 

These procedures are quite different. The first involves the data set as you 
find it. The second involves an intervention. 

Regression does seem to let you move from selection to intervention; 
that is why the technique is so popular. However, regression approximates 
the selection procedure, rather than intervention. Nor does the statistical 
analysis prove that the two procedures give the same results; how could it? 
Instead, causal inferences are made by assuming that selection tells you 
what would happen if you were to intervene. 

The phrase “X is exogenous” is often taken to mean that selecting on X 
will produce the same results as intervening to set the value of X —the 
basic assumption in many analyses. Exogeneity also has weaker meanings, 
to be taken up later. The ambiguity is unfortunate, because analysts may 
assume exogeneity in a weak sense and proceed as if they had established 
something more. It is only exogeneity in the strong sense defined above 
that enables you to predict the results of interventions from nonexperi- 
mental data. 

The distinction between selection and intervention is acknowledged 
even by the modelers (Pearl [44, p. 396]): 

Formally speaking, probabilistic analysis is indeed sensitive only to covariations, 
so it can never distinguish genuine causal dependencies from spurious correla¬ 
tions .... 

Such admissions—like Yule’s [71] footnote 25—are fatal to the enterprise. 
Of course, Pearl does not give up. For instance, he goes on to say that 
experiments just provide the opportunity to observe yet more correlations, 
a move he attributes to Simon [59]. 

Figure 2 is Pearl’s [44]. On the left, it seems that X and Z cause Y: 
manipulating X or Z will change Y. However, if only we had measured the 


a 


b 


X Z U V 



Y X Y Z 


FiO. 2. After Judea Pear! [44, p. 397]. Causation cannot be inferred from association by 
using causal models. In panel (a), X and Z are assumed to be independent. In panel (b), U 
and V are assumed to be independent; it may be shown in consequence that X and Z are 
independent. Also see Duncan [12, pp. 113-1271. 
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variables U and V, we might have seen that they were the joint causes of 
X, Y, and Z, as in the right-hand panel. If so, manipulating X and Z will 
not change Y at all. No amount of statistical analy.sis on the 
observables—on X, Y, and 2—can tell us which panel expresses the right 
theory. Indeed, matters can be arranged so that both theories lead to the 
same joint distribution for the observables. 


4. TWO IDEAS OF CONDITIONAL PROBABILITIES 

The distinction between the two ideas of conditioning—selecting sub¬ 
jects with X = X, or intervening to set X = x —seems fundamental. A 
concrete example may help, and conditional probabilities are easier to deal 
with than conditional expectations. 

Many studies have demonstrated an association between cervical cancer 
and exposure to two sexually transmitted diseases—herpes and chlamydia. 
Suppose we had data as shown in Table I. The incidence rate of cervical 
cancer is 200 per 100,000 for women exposed to herpes and chlamydia (top 
left); 116 per 100,000 for women exposed to herpes but not chlamydia; and 
130 per 100,000 for those exposed to herpes, the two exposure categories 
for chlamydia being combined. Other cells may be read in a similar way. 

With sample data, there is a role for technical statistics in estimation 
and testing—for instance, to see if the rates within a row are constant 
across columns. However, the real question is not association but causa¬ 
tion. Does herpes cause cervical cancer? What about chlamydia? Biotech¬ 
nology might find a way to eliminate Herpes simplex as well as Chlamydia 
trachomatis. That would be a great relief, but would it reduce the incidence 
rate of cervical cancer? 

To consider the issue of causality more directly, suppose that we actually 
know the rates for the population of interest, as shown in Table 1. 
Statistical testing must now fade into the background. The overall inci- 


TABLE t 

Rate of Cervical Cancer Cases per 100,000 Women, by Exposure to Chlamydia and Herpes 
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dence rate is 100 cervical cancers per 100,000 women (Table 1, bottom 
right). Among women exposed neither to herpes nor to chlamydia, the rate 
is lower—80 per 100,000. If cervical cancer is caused by herpes and 
chlamydia, eliminating the microorganisms responsible for those diseases 
should reduce the incidence rate of cervical cancer from lOO to 80 per 
100,000. On the other hand, if the relationship is not causal, eliminating 
those microorganisms will have little effect on the incidence rate of the 
cancer. 

To be more explicit, 80/100,000 has been found by selecting women 
who are exposed to neither herpes nor chlamydia and by computing the 
incidence rate of cervical cancer for that group, one interpretation of 
conditional probability. If we intervene and eliminate the two diseases, we 
want to know the rate after the intervention; that is another interpretation. 
The two interpretations are different, because the underlying procedures 
are different. Statistical analysis of the numbers in the table, however 
refined or complex, cannot prove that a hypothetical intervention will give 
the same results as selection. This may seem obvious, even banal; but if 
you grant the point, the causal modeling game is largely over. 

What is the situation for Table I? The stoty is far from certain. Current 
epidemiological opinion favors the idea that cervical cancer is caused by 
certain strains of human papilloma virus (HPV); herpes and chlamydia 
have no etiologic role, but serve only as markers for exposure to HPV. If 
that opinion is correct, wiping out herpes and chlamydia will have no 
impact On rates of cervical cancer. 

Due in part to the rarity of cervical cancer, cohort studies do not seem 
to be available. (The numbers in Table I, although hypothetical, are not 
unreasonable.) My point is even stronger for the real studies of the 
association between cervical cancer and herpes or chlamydia. Problems 
created by incomplete data cannot simpliiy the task of inferring causation 
from association.^ 


5. ANOTHER REGRESSION EXAMPLE 
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Rindfuss et al. [55] propose a model to explain the process by which a 
woman decides how much education to get, and when to have her first 
child. The model illustrates many features of contemporary technique.^ 



* For a discussion of the epidemiology, see Cairns 14], peto and zur Hausen [51], Sherman 
et al [58], Hakama et al [25], Munoz et at. [75], 

’ I use this example because it is discussed by SGS [62, pp. 139-140]. 
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by SGS [62, pp. 139-140). 



Before we take up the model, let the authors say what they were trying to 
do: 

The interplay between education and fertility has a significant influence on the 
roles women occupy, when in their life cycle they occupy these roles, and the 
length of time spent in these roles... . This paper explores the theoretical 
linkages between education and fertility.... It is found that the reciprocal 
relationship between education and age at first birth is dominated by the effect 
from education to age at first birth with only a trivial effect in the other 
direction. [Abstract] 

No factor has a greater impact on the roles women occupy than maternity. 
Whether a woman becomes a mother, the age at which she does so, and the 
timing and number of subsequent births set the conditions under which other 
roles are assumed.... Education is another prime factor conditioning female 
roles, [p. 431, footnote omitted] 

The overall relationship between education and fertility has its roots at some 
unspecified point in adolescence, or perhaps even earlier. At this point aspira¬ 
tions for educational attainment as a goal in itself and for adult roles that have 
implications for educational attainment first emerge. The desire for education 
as a measure of status and ability in academic work may encourage women to 
select occupational goals that require a high level of educational attainment. 
Conversely, particular occupational or role aspirations may set standards of 
education that must be achieved. The obverse is true for those with either low 
educational or occupational goals. Also, occupational and educational aspira¬ 
tions are affected by a number of prior factors, such as mother's education, 
father’s education, family income, intellectual ability, prior educational experi¬ 
ence, race, and number of siblings, [p. 432, citations omitted] 

The model used by Rindfuss et al. [55] is shown in Fig. 3. The diagram 
corresponds to two linear equations in two unknowns, ED and AGE 
(variables are defined in Table 11): 

ED X AGE-b^, (3) 

AGE = a' X ED -b A. (4) 

According to the model, a women chooses her educational level and age at 
first birth as if by solving these two equations for the two unknowns. 

The coefficients a and a’ are “social constants,” to be estimated from 
the data. The terms A and A' take background factors into account: 

A ~ Aq "b ^ X DADSOGC L X RACE + **• -bC7 X YCIG 

-b random error drawn from a box, ^ ' 

A = A'o + b’ X EEC + c'l X RACE + ••• Tc', X YCIG 

+ another random error drawn from a box. ^ ^ 

Again, the parameters Ag, b, c^,... are social constants to be estimated 
from the data. The random errors are assumed to have mean 0, to be 
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FlO- 3. The model in diagram form [55; 62, p. 140], Variables are defined in Table II. 
Explanatory variables (DADSOCQ RACE, etc.) are correlated; error terms are not shown in 
the diagram, 

TABLE II 

Variables in the Model [55] 

The endogenous variables 
ED Respondent's education 

(Years of schooling completed at first marriage) 

AGE Respondent's age at first birth 

The exogenous variables 

DADSOCC Respondent’s father’s occupation 
RACE Race of respondent (Black = 1, other = 0) 

NOSIB Respondent’s number of siblings 

FARM Farm background (coded 1 if respondent grew up 

on a farm, else 0) 

REGN Region where respondent grew up (South = 1, other = 0) 
ADOLF Broken family (coded 0 if both parents present at 
age 14, else i) 

REL Religion (Catholic = 1, other = 0) 

YCIG Smoking (coded 1 if respondent smoked before age 16, 

else coded 0) 

FEC Fecundability (coded 1 if respondent had a 

miscarriage before first birth; else coded 0) 

Note. The data are from a probability sample of 1766 women 35-44 
years of age residing in the continental United States; the sample was 
restricted to ever-married women with at least one child. DADSOCC 
was measured on Duncan’s scale, combining information on education 
and income; missing values were imputed at the overall mean. SGS [62, 
p. 139] gives the wrong definitions for NOSIB iind ADOLF. 
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statistically independent from woman to woman, and to be identically 
distributed. Correlations across Eqs. (5) and (6) are permitted. 

Equations (3)-(6) are not quite regression equations, due to the simul¬ 
taneity of (3) and (4); fitting by OLS (ordinary least squares) would create 
“simultaneity bias." Thus, Rindfuss et al. [55] use an estimation procedure 
called “two-stage least squares.”*® FEC does not enter into Eq. (5), nor 
DADSOCC into Eq. (6). Graphically, there is no arrow from DADSOCC 
to AGE in Fig. 3; likewise, there is no arrow from FEC to ED. These 
behavioral assumptions are critical to the statistical enterprise. Without 
them, or some similar assumptions, two-stage least squares could not be 
used. Technically, the system would not be “identifiable” (Section 11.4). 

The main empirical finding: The estimated coefficient of AGE in the 
first equation is not “statistically significant”; i.e., the coefficient a in (3) 
could be zero. The sort of woman who drops out of school to have a child 
would drop out anyway. 

If looked at coldly, the argument may seem implausible. A critique can 
be given along the following lines; 

(i) Statistical assumptions. Just why are the errors independent and 
identically distributed across the women? Independence may be reason¬ 
able, but heterogeneity is more plausible than homogeneity. 

(ii) The assumption of constant coefficients. Rindfuss et al. [55] are 
■a.ssuining that the same parameters apply to all women alike, from poor 
blacks in the cities of the Northeast to rich whites in the suburbs of the 
West. Why? 

(hi) Omitted variables. Surely, important variables have been omitted 
from the model, including two that were identified by Rindfuss et al. 
[55]—aspirations and ability. Malthus thought that wealth was an impor¬ 
tant factor. Social class matters, and DADSOCC measures only one of its 
aspects.’* 

(iv) What about the “no arrow” assumptions, from DADSOCC to AGE 
and FEC to ED? 

(v) Are FEC and DADSOCC exogenous? 

(vi) Are the equations “structural”? 

Questions (iv)-(vi) will be discussed in the next section, as will the idea of 
“structural” equations. 


See, e.g., Maddala [39]; for discussion, see Daggett and Freedman [9], 

” The solution to the “omitted variable” problem may seem easy—just throw some more 
variables into the model. The difficulties are” explored in Clogg and Haritou [6], Also see 
Freedman [17], 
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5.1. A Thought Experiment 

A simpler version of the model restricts attention to a more homoge¬ 
nous group of women, where the only relevant background factors are 
DADSOCC and FEC. To make causal inferences from the data using the 
model, we need to believe that the arrows are as shown in Fig. 4, that 
DADSOCC and FEC are exogenous, and that the equations are “struct¬ 
ural.” The following thought experiment may help to define the last term, 
and the empirical commitments behind the words. 

The gedanken experiment involves two groups of women. In both 
groups, fathers are randomized to jobs, and some of the daughters are 
chosen at random to have a miscarriage before their first child. (The 
statistical terminology of randomization is dry; the gedanken experimental¬ 
ist intervenes, for instance, to make the fathers do one job rather than 
another; professors are caused to work as plumbers, and taxi drivers are 
installed as hospital anesthetists.) 

Group /. Daughters are randomized to the various levels of ED, and 
AGE is observed as the response. (The gedanken experimentalist strikes 
again, forcing some women to stay in school longer than they wish, while 
preventing others from continuing their education.) 

Group II. Daughters are randomized to the various levels of AGE, 
and ED is observed as the response. (More gedanken intervention is 
needed.) 

The statistical model can now be translated. For the women in Group I, 
AGE should not depend on DADSOCC—the “no arrow” assumption; 
however, AGE should depend linearly on ED. For the women in Group II, 
ED should not depend on FEC—the other “no arrow” assumption; 


DADSOCC 


FEC 


ED 


-S- AGE 


FlO. 4. A simpler version of the model. 


Also see Pearl [46,471. 
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however, ED should depend linearly on DADSOCC. Rindfuss et aids, [55] 
discovery is that ED would not depend on AGE. 

There is one final assumption: the equations and parameters that 
describe the responses of the women in the experiment must also describe 
the natural situation. That is what “structural” means. For instance, a 
women who freely chooses her educational level and her time to bear 
children does so by using the same equations as a woman made to give 
birth at a certain age. In short, with respect to the matters at issue, life in 
Des Moines proceeds more or less along the same lines as life in the 
Gulag. 

The thought experiment provides the intellectual foundation for the 
model, by articulating the background assumptions. These assumptions 
have not been subjected—cannot be subjected—to direct empirical proof, 
nor can assumptions be validated by appealing to thought experiments that 
are almost unthinkable. Do the modelers have some other method in 
reserve? If the assumptions remain unvalidated, what is the logical status 
of their implications? 

5.2. Exogeneity 

Identifying the exogenous variables is a major problem. For example, 
you can obtain results quite different from those of Rindfuss et al, [56] by 
using variables other than DADSOCC and FEC as “instruments.”'^ Rind¬ 
fuss et al. [56, pp. 981-982] respond that estimates made by 


instrumental variables ... require strong theoretical assumptions ... and can 
give quite different results when alternative assumptions are made... it is 
usually difficult to argue that behavrorai variables are truly exogenous and that 
they affect only one of the endogenous variables but not the other. 

In short, results can depend quite strongly on assumptions of exogeneity, 
and there is no good way to justify one set of assumptions rather than 
another. Also see Bartels [1], who comments on the impact of exogeneity 
assumptions and the difficulty of verification. 


'' See Hofferth and Moore [27,42]. An “instrument” is an exogenous variable, used as part 
of the two-stage least squares estimation procedure. Some investigator.? may draw a termino¬ 
logical distinction: an “instrument” is exogenous, but does not appear as an explanatory 
variable in the equation being estimated. For purposes of estimation, exogenous variables are 
assumed to be independent of error terms; this does not suffice for causal inference (Section 
11). Even the independence assumption is not to be made lightly: see Clogg and Haritou [6]. 
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6. AUTOMATED SEARCHES FOR CAUSALITY 

SGS [62] have computerized algorithms that search for path models. 
Using the algorithms, SGS clairq to make rigorous inferences of causation 
from association. For present purposes, a “path model” is a recursive 
system of regression equations, in which the dependent variables from 
some equations are used as explanatory variables in later equations.^"* 

The basic idea in path models is this: putative causes combine with 
parameters and random errors by multiplication and addition in order to 
produce their effects. I have discussed such models elsewhere and do not 
believe they offer much help in deducing causation from association, 
because there is little evidence to support the basic assumptions 
(Freedman [18]). To pursue the discussion here, a slightly more explicit 
definition of the models may be In order. 

Definition. A “path model” starts with variables at “level 0,” which 
are exogenous in the minimal sense that they are not explained within the 
model. Variables “at level 1” are built up as linear combinations of level 0 
variables, plus independent random errors. More generally, variables “at 
level A:” are built up as linear combinations of variables at previous levels; 
again, there are additive, independent random errors. Variables at level 1, 
level 2,... are “endogenous,” in the sense that they are explained within 
the system. The path model may be presented as a “path diagram,” like 
Fig. 1, or Fig. 5 below. Nodes represent variables in the model; if there are 
arrows from X,Y,... to Z, then are explanatory variables in the 

regression equation for Z. Nodes are often called “vertices,” and the 
diagrams are referred to as “graphs” or “causal graphs.”^^ 

The path model may represent mere association—conditional depen¬ 
dence and independence relations. Or the model may represent causation. 
I will take that up later. For now, however, either interpretation suffices. 


The model used by Riudfuss et al. [55] would not fall into this category, if ED and AGE 
really influenced each other. The SGS [62] framework excludes reciprocal causation, by 
assumption; so do path models, as I define them. However, some authors extend the 
definition of path models to include simultaneous equation models for reciprocal causation. 

SGS [62] seem to make the strong—and quite unusual—assumption that exogenous 
variables are independent of each other. That may be part of the reason why their algorithms 
estimate such peculiar models in Figs. 5 and 6 below. There is another, even more esoteric, 
point. To estimate an equation, its error term need only be assumed independent of the 
explanatory variables, if so, error terms from different equations may be correlated; then 
standard procedures for computing the correlations among the variables will not apply; see 
Freedman [IS, pp. 112-li4]; Seneta [57, p. 199], SGS seem to interpret correlated errors as 
indicating the presence of ‘Matent variables.” Such variables will be mentioned in notes to 
Figs. 5 and 6, below. 
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Fig. 5. The left-hand panel shows the model reported by SGS [62]. The right-hand panel 
also shows connections among the regressors, as determined by the search program 
TETRAD. BUILD indicates that latent variables are present, i.e,, errors are correlated across 
equations. BUILD asks whether it should assume “causal sufficiency^^ without this assump¬ 
tion [62, p. 45], the program output is uninformative. Therefore, I told BUILD to make the 
assumption; I believe that is what SGS [62] did for the Rindfuss example. Also sec Spirtes 
el al [63j pp. 13-15]. I told BUILD that ED and AGE could not cause the remaining 
variables, following SGS [62, p. 139]. However^ SGS [62] actually made the stronger assump¬ 
tion that (i) FEC, ED, and AGE could not cause YCIG, and (ii) FEC, ED, AGE, and YCIC 
could not cause the remaining variables. With the assumption of causal sufficiency, BUILD 
seems to use the PC algorithm; without the assumption, the FCI algorithm comes into play. 
Much of this information comes from Richard Schemes (personal communication). Data are 
from Rindfuss et al [55], not SGS [62]; with the SGS covariance matrix, FARM causes REGN 
and YCIG causes ADOLF. 


Suppose the graph is “sparse”—each equation la the model involves 
relatively few variables. Suppose, too, there are no troublesome algebraic 
identities among the regression coefficients; in SGS terminology, the 
distribution is “faithful” to its graph [62, p. 35]; see Section 11.2 below. 
You have a sample—many independent realizations of variables 
X,Y, Z,... . You are willing to assume the distribution conforms to a path 
model, but do not know which model. You do not even know which 
variables are at level 0, which are at level 1, and so forth. 

SGS [62] claim their algorithms are likely to find the underlying path 
model, or a rather similar model, and in short order. Their most convinc- 
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ing evidence is based on simulation experiments, where the computer 
generates data from a path model and the SGS algorithms try to infer the 
model from the data [62, pp. 145ff, 152ff, 250ff, 320ff, 332ff]; in these 
experiments, the algorithms do very well. Roughly speaking, the SGS 
algorithms are variants of “best subsets” regression, the search being over 
graphs rather than subsets. The data come into the SGS algorithms only 
through the covariance structure. The rest of the apparatus—the dia¬ 
grams, the Markov property, faithfulness, etc.—consists of assumptions. 

SGS [62] seem to assert that their algorithms determine causality, as a 
matter of mathematics. Such assertions are not defensible. In the SGS 
formalism, causation is obtained not by mathematical proof but by mathe¬ 
matical assumption. If you assume that the arrows in the underlying path 
diagram represent causes, then the arrows found by the algorithms repre¬ 
sent causes. If you assume that the underlying arrows represent mere 
associations, then the arrows found by the algorithms represent associa¬ 
tions. Causation has to do with empirical reality, not with mathematical 
proois based On axioms. The issue is not one of theorems, but of the 
connection between theorems and reality. 

The SGS algorithms [62], like many earlier statistical procedures {factor 
analysis, LISREL, etc.), proceed by analyzing the correlation matrix of a 
set of variables. 1 will call such methods “correlational.” Sections 7—10 
consider applications of the SGS algorithms to real examples. Sections 
11-12 try to explain the key ideas in the SGS formalism and indicate by 
mathematical example some of the intrinsic limitations. Before proceed¬ 
ing, however, I discuss the SGS statement of assumptions. 

6.1. The SGS Statement of Assumptions 

SGS [62] discuss the role of assumptions in their theory several times 
(pp. 53-69, pp. 75-81, pp. 324-325, p. 351). However, the clearest state¬ 
ment can be found when SGS are trying to discredit the evidence that 
smoking causes lung cancer: 

effects * « * ^ cannot be predicted from * * * * sample conditional 
probabilities, [p. 302] 


Readers may consult the original for context, to see whether the omitted 
material affects the meaning. The advantage of the quote is clarity. If the 
statement is generally applicable, then SGS—like Yule and Pearl before 
them—have disavowed the ability to infer causation from association. 
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7. THE SGS EXAMPLES 

SGS [62] share my pessimistic views about regression. They claim, 
however, that their algorithms will succeed where regression has failed: 

In the absence of very strong prior causal knowledge, multiple regression 
should not be used to select the variables that influence an outconte or 
criterion variable in data from uncontrolled studies. So far as we can tell, the 
popular automatic regression search procedures [like stepwise regression] should 
not be used at ali in contexts where causa! inferences are at stake. Such 
contexts require improved versions of algorithms like those described here to 
select those variables whose influence on an outcome can be reliably estimated 
by regression. In applications, the power of the specification searches against 
reasonable alternative explanations of the data is easy to determine by simula¬ 
tion ... . [p. 257] 

At first reading, SGS seems to be filled with real examples showing the 
successful application of their algorithms. That is an illusion, Many of the 
examples are based on simulation, and I set those aside.The real 
examples are mostly to be found in SGS [62, pp. 132-152, 243-256].^^ 

The main examples given in SGS [62] are path models. But these cannot 
withstand scrutiny—see Section 5 above and Sections 8-9 below. One 
exception is the stratification model of Blau and Duncan [3]. SGS [62, pp. 
142-145] seem to be quite critical of this model; their current position is 
almost diametrically opposite to the one in Glymour ef al. [24, pp. 33-39]. 
Like SGS, I do not believe that the Blau-Duncan regressions are a 
satisfactory causal model. On the other hand, as descriptions of the data, 
the equations can tell us something important about our society: see 
Freedman [18, pp. 122, 220], The discussion in SGS adds little to our 
understanding either of the model or of stratification. 

SGS [62] appear to use the health effects of smoking as a running 
example to illustrate their theory.'® Again, there is an illusion. The causal 
diagrams are all hypotheticals, no contact is made with data, and no 
substantive conclusions are drawn. If the diagrams were proposed as real 
descriptions of causal mechanisms, they would not be credible. 

What about the substantive question: does smoking cause lung cancer, 
heart disease, and many other illnesses? SGS [62] appear not to believe the 


Simulations tell us how well the SGS algorithms do if the underlying statistical assump¬ 
tions hold good; the assumptions are built into the computer code that generates the 
simulated data. When applying statistical algorithms to real data, a critical quesh'on is whether 
those assumptions hold. The simulations do not address such questions. 

Parallel material is in [23, pp. 13-16, 21-23], 

'''' See, e.g., [62, pp. IS, 216-237], 
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epidemiological evidence. When they actually get down to arguing their 
case, they use a rather old-fashioned method—a literature review with 
arguments in ordinary English [62, pp. 291-302]. Causal models and search 
algorithms have disappeared. „ 

I approve of the method if not the implementation; the summary is 
wrong in some places and tendentious in others. However, the review does 
show the complexity of the issues. To make judgments about causation, 
you need to consider death certificate data, necropsy data, case control 
and cohort studies, twin studies, dose response curves, as well as animal 
experiments and human experiments. The force of the epidemiological 
evidence—and the SGS critique—depends on the complex interplay among 
these various studies and data sets. 

In the end, SGS [62] do not really make bottom-line judgments on the 
health effects of smoking, at least so far as I can see. Their principal 
conclusion is methodological: nobody understood the issues. 


Neither side understood what uncontrolled studies could and could not deter¬ 
mine about causal relations and the effects of interventions. The statisticians 
pretended to an understanding of causality and correlation they did not have; 
the epidemiologists resorted to informal and often irrelevant criteria, appeals to 
plausibility, and in the worst case to homincm .... While the statisticians 
didn’t get the connections between causality and probability right, the, 
‘"epidemiological criteria for causality” were an intellectual disgrace, and the 
level of argument.. .was sometimes more worthy of literary critics than scien¬ 
tists. [62, pp. 301-302]. 


Part of a sentence in SGS [62, p. 4] does seem to grant one of the major 
claims made by the epidemiologists, “smoking does cause lung cancer.” 
But that only complicates the puzzle. If you don't believe the evidence, 
why accept the claim? 

Despite SGS [62], the epidemiologists did have a good understanding of 
the issues and made a strong case against smoking. The arguments were 
imperfect, and some reasonable doubts may remain. But the data, taken 
all in all, are compelling. The epidemiological literature on smoking is far 
stronger than anything I have seen in the social sciences. For a survey of 
the evidence, see Cornfield et al. [7]; this paper is still worth reading. More 
recent data are reviewed in [30]. 

SGS [62] elected not to use their analytical machinery on the smoking 
data—a remarkable omission. When applied to the examples that SGS 
actually chose, the algorithms produce one small disaster after another, as 
will now be seen. In sum, SGS [62] claim to have developed techniques for 
generating causal models; but they do not have any success stories. 
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8. USING THE SGS SEARCH PROCEDURE 

The SGS search procedures are embodied in a computer program called 
TETRAD [62]. Version 2.1 of this program was kindly provided by Richard 
Scheines and Peter Spirtes. The BUILD module is the part of TETRAD 
used to discover path models with no latent variables. I ran BUILD on two 
examples—Rindfuss et al. [55] and AFQT (to be discussed in Section 9). 

8.1. Rindfuss et al. 

To explain AGE (age at first birth) in the Rindfuss et al [55] example, 
the SGS [62] algorithms select the variables shown in Table III. Regression 


TABLE III 

The SGS [62] model for age at first birth, computed using the SGS covariance matrix or the 
Rindfuss el al. [55] covariance matrix 



SGS covariance 

Rindfuss et al. covariance 

Estimate 

= 0.27 

SE 

t 

Estimate 

= 0.24 

SE 


RACE 

-1.66 

.30 

-5.50 

-1.66 

.30 

-5.46 

REGN 

-0.56 

.19 

-3.01 

-0.63 

.19 

-3.35 

ADOLF 

1.89 

.22 

8.60 

2.01 

.22 

8.98 

YCIG 

2.14 

.25 

8.63 

-0.S9 

.25 

-3.53 

FEC 

2.72 

.28 

9.70 

2.77 

.28 

9.72 

ED 

0.67 

.04 

18.00 

0.60 

.04 

15.72 


Note, (i) Intercepts are not reported; OLS estimates. 

(ii) Tlie first column in Table 3 shows parameter estimates. Hie second shows standard 
errors, or SEs, which indicate the likely size of the differences between the estimates and the 
true parameter values. The t-statistics in the third column are the ratios of estimates to SEs. 
Generally, a t-statistic above 2 or 3 in absolute value indicates that the corresponding 
parameter is unlikely to be truly 0. For details, see the Appendix. 

(iii) The parameters are features of the model, and the SEs are computed using the model, 
if you do not believe in the existence of the parameters apart from the data, or do not accept 
the statistical assumptions in the model, the SEs and t-statistics are likely to be meaningless. 
In any case, performing multiple tests—as in a search algorithm—complicates the interpreta¬ 
tion of the r-statistics [17, 23], 

(ivl is generally interpretable as a descriptive statistic, whether or not the assumptions 
of the model hold true. An i?“ of 0.27 indicates that about 27% of the variance in AGE has 
been explained; that is not much, and models in the social science literature often have even 
less explanatory power. For a discussion of R“, see [20, pp. 78-81]. 

(v) According to current epidemiological opinion, smoking does have some biological 
effect, delaying conception by several weeks. However, the women who choose to smoke arc 
different from the nonsmokers and have their first child almost a year earlier. This effect 
remains even after controlling for the measured background factors in the regression; the 
coefficient of YCIG is —0.89 years. ' ~ 
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estimates for the coefficients, based on summary data in SGS, are reported 
in the first three columns of the table. The coefficients for ADOLF (the 
indicator for women from broken homes) and YCIG (an indicator for 
smoking by age 16) have positive signs. That is paradoxical: women from 
broken homes and women who smoke should be having children earlier, 
not later.^* The signs should be negative, not positive. SGS do not 
comment on this issue. 

Rindfuss et aL [55] give standard deviations and correlations for their 
data; SGS [62, p. 139] used these statistics to compute a covariance matrix, 
but reversed some of the signs. The last three columns of Table III report 
regression estimates computed from the correct covariances. The problem 
with YCIG disappears, but the sign for ADOLF stays positive. Anyone can 
make a mistake entering data; ignoring paradoxical signs in a causal model 
is quite another matter. 

SGS [62] report only a graphical version of their model. They say, 

Given the prior information that ED and AGE are not causes of the other 
variables, the PC algorithm (using the .05 significance level for tests) directly 
finds the model [in Figure 5(a)] where connections among the regressors are 
not pictured. [62, p, 139] 

However, connections among regressors can be of interest. Although 
TETRAD is supposed to discover the causal ordering of explanatory 
variables, it produces the very strange model shown in Fig. 5(b). For 
example, the model says that race and religion cause region of residence. 
Comments on the sociology may be out of place, but consider the statistics. 
The equation is 

REGN = a + b X RACE + c X REL + e. (7) 

REGN is a dummy variable, coded 1 for respondents who grew up in the 
South, 0 for others; RACE is 1 for black respondents and 0 for others; 
REL is 1 for Catholics, 0 for others; e is normally distributed. In conse¬ 
quence, this equation forces impossible values on REGN: the left-hand 
side is 0 or 1, the right-hand side varies from — oo to -l-oo. Now is only 
0.16, so £ contributes most of the variance; Eq. (7) can hardly be defended 
as an approximation. Having dummy variables in the middle of path 
diagrams is a blunder. (FARM creates a similar problem; so does NOSIB, 
although less extreme.) In short, the SGS algorithms have produced a 
model that fails the most basic test—internal consistency. 

Smoking, broken homes, and eariy childbearing seem to be correlates of social disadvan¬ 
tage and indicators of personality traits. DADSOCC and RACE are quite imperfect controls 
for family background; therefore, YCIG and ADOLF are likely to pick up the effects of 
background, as well as the effects of omitted personality variables. See note (v) to Table III. 
This sort of bias is discussed in Section 12.2 below. Also see Qogg and Haritou [dj. 
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9. THE ARMED FORCE 

SGS [62] discuss an example base 
Test (AFQT).-" The AFQT Is a line 
scores on certain subtests. Some of : 
are not part of the AFQT, are listed 
which subtests go into the AFQT an 
The problem may be stated more 

AFQT score = a[ X NO - 
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table. The coefficients for ADOLF (the 

:n homes) and YCIG (an indicator for • ' & SGS [62] discuss an example based on the Armed Forces Qualification 

• signs. That is paradoxictd: women from I ^ Test (AFOT),^® The AFQT is a linear combination with fixed weights of 
.moke should be having children earlier, I scores on certain subtests. Some of these subtests, as well as subtests that 

le negative, not positive. SGS do not _ ' are not part of the AFQT, are listed in Table IV. The problem is to decide 

I • 1 which subtests go into the AFQT and which do not. 

ird deviations and correlations for their * " The problem may be stated more algebraically as 


statistics to compute a covariance matrix, 
he last three columns of Table III report I 
rm the correct covariances. The problem ■ 
,n for ADOLF stays positive. Anyone can 
oring paradoxical signs in a causal model » 

1 

il version of their model. They say, 

ED and AGE are not causes of the other ■ 

the .05 significance level for tests) directly ,1 

here connections among the regressors are. '■ 



regressors can be of interest. Although 
/er the causal ordering of explanatory 
strange model shown in Fig. 5(b). For 
i and religion cause region of residence, 
e out of place, but consider the statistics. 


II 


RACE + c X REL + e. (7) 

-d 1 for respondents who grew up in the 
for black respondents and 0 for others; 
ers; e is normally distributed. In conse- 
ossible values on REGN; the left-hand 
varies from —to +=, Now is only 
variance; Eq. (7) can hardly be defended 
immy variables in the middle of path 
.■ates a similar problem; so does NOSIB, 
. the SGS algorithms have produced a 
St —internal consistency. 

Idbearing seem to be correlates of social dtsadvan- 
■ADSOCC and RACE are quite imperfect controls 
and ADOLF are likely to pick up the effects of 
ed personality variables. See note (v) to Table HI- 
: .2 below. Also see Clogg and Harifou [6]. 



AFQT score = a, X NO + 03 X WK + ••• +a-/ X GS 

+ X UNi + —+b„ X UN„, (8) 

where UN^,.... UN„ are unobservable. Some of the a’s are zero, and the 
challenge is to figure out which ones. 

We have data on 6224 subjects, summarized as a covariance matrix. 
According to SGS [62, pp. 243-244]: 

a linear multiple regression of AFQT on the other seven variables gives 
significant regression coefficients to all seven and thus fails to distinguish the 
tests that are in fact linear components of AFQT,.. . Given the prior informa¬ 
tion that AFQT is not a cause of any of the other variables, the PC algorithm in 
TETRAD II correctly picks out {AR, NO, WK} as the only.. .variables that can 
be components of AFQT.... 

To test the claims about regression, I ran AFQT on all the observable 
subtests. As Table V shows, El and MC are related to AFQT only at the 
chance level. Moreover, MK and GS have negative coefficients, but 
psychometric practice frowns on subtests that are negatively related to 
overall test scores. It is a natural conjecture that NO, WK, and AR go into 
AFQT, while the other four subtests do not. Contrary to the claims of 
SGS, the AFQT can be handled by ordinary statistical methods. 


TABLE IV 

Subtests Analyzed by SGS [62] 


1. Numerical Operations NO 

2. Word Knowledge WK 

3. Arithmetical Reasoning AR 

4. Mathematical Knowledge MK 

5. Electronics Information El 

6. Mechanical Comprehension MC 

7. General Science GS 


l^ote. Some go into the AFQT and some do 
not. 

■"SGS [62, p. 243]. Institutional background on the AFQT will be found in Section 12.5. 
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TABLE V 

Regression of AFQT on All the Observable Subtests 



Estimate 

SE 

t 

NO 

0.24 

.022 

10.8 

WK 

1.17 

.029 

40.5 

AR 

1.03 

.028 

36.4 

MK 

-0.24 

.028 

-8.7 

El 

-0.03 

.024 

-1.3 

MC 

0.03 

.024 

IS 

QS 

-0.13 

.029 

-4.6 


Mote, Variables were centered at their means. 


The AFQT problem is in some ways quite easy. By definition, the 
“causes” or subtests combine linearly with the parameters to produce the 
AFQT as an “effect.” Joint normality of test scores seems to follow from 
the procedures used to construct the tests: consequently, scores on any one 
subtest can be presented as a linear combination of other subtest scores, 
with additive random errors. Thus, critical issues in most empirical studies 
have disappeared.^^ 

9.1. TETRAD 

According to SGS, given the prior information that AFQT does not 
cause the other variables, TETRAD correctly picks out AR, NO, and WK 
as the components of the AFQT.^^ Without that prior information, how¬ 
ever, TETRAD declares AFQT to be the cause of these subtests, rather 
than the effect. With the prior information, TETRAD produces the strange 
results shown in Figure 6.^ Now, for instance, the subtest NO may 
“cause” the overall test score AFQT, but it can hardly cause the other 
subtests AR or MK. Furthermore, there is a cycle in the figure: 

MC ^ AR ^ WK GS ^ MC. 

In principle, such cycles were excluded by prior assumption, as well they 
might be. Subtests should not cause themselves, even indirectly. To sum 
up: 

(i) ordinary least squares techniques pick out NO, AR, and WK for 
the probable components of the AFQT, just as TETRAD does; 

(ii) TETRAD produces the curious model in Figure 6. 

On the other hiind, unobserved variables may create serious problems (Section 12.4). 

“ SGS [62, p. 243], 

^ The program output is given in Spirtes et al. [63, pp. iO-H]. 
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TABLE V 

on Ali the Observable Subtests 


SE 
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.022 

lO.S 

.029 

40.5 

.028 

36,4 

.028 

-8.7 

.024 

-1.3 

.024 

1.3 

.029 

-4.6 
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Fic. 6. AFQT and its subtests arranged in causal order by the search program TETRAD. 
I believe SGS [62, pp. 243-244] used BUILD, with the assumption of causal sufficiency, for 
the AFQT example. Also see Spirtes et al. [63, pp. 8-11]. The program indicates there are 
latent variables, Le., correlations in the errors. 


10. FOREIGN INVESTMENT AND POLITICAL 
OPPRESSION 

As noted in Section 7, SGS are quite pessimistic about typical social-sci¬ 
ence applications of regression. While I agree with the bottom line, their 
specific objections seem misplaced. One example is enough to make the 
point. Timberlake and Williams [65] offer a regression model to explain 
political exclusion (PO) in terms of foreign investment (FI), energy devel¬ 
opment (EN), and civil liberties (CV). Pligh values of PO correspond to 
authoritarian regimes that exclude most citizens from political participa¬ 
tion; high values of CV indicate few civil liberties. Data come from 72 
countries. Correlations among the Timberlake-Williams variables are 
.shown in Table VI. 

The equation proposed by Timberlake and Williams [65] is 

PO = fl+&XFI+cXEN + dxCV+ error. (9) 

Empirical results are shown in the first three columns of Table VII. The 
estimated coefficients of FI is significantly positive and is interpreted as 
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TABLE VI 

The Timberlake and Willianis Correlation Matrix 



PO 

FI 

EN 

CV 

PO 

t.000 

‘ - .175 

-.480 

.868 

FI 

-.175 

1.000 

.330 

-.391 

EN 

-.480 

.330 

1.000 

-.430 

CV 

.868 

-.391 

-.430 

1,000 


Note. Correlation matrix for political oppression (PO), 
foreign investment (FI), energy development (EN), and civil 
liberties (CV). Source: [62, p. 249]. 


measuring the effect of foreign investment on political exclusion; 
see Timberlake and Williams [65, p. 143]. 

SGS discuss this example [62, pp, 248-250], suggesting that Timberlake 
and Williams have confused cause and effect. The alternative causai 
sequence is not spelled out. Presumably, the idea is that dictators “cause” 
foreign investment in the sense that investors think dictatorial regimes 
offer greater stability, etc. 

The main step in the SGS statistical argument comes down to this; the 
correlation of —0.175 between political exclusion and foreign investment 
is at the chance level. The calculation rides on two assumptions: (i) the 72 
countries in the data set are a random sample from some much larger set 
of countries and (ii) the variables follow a multivariate normal distribution. 
These time-honored but madcap assumptions are not stated explicitly by 


TABLE VII 

The Timberlake and Williams Model 



R'^ = .81 



R- = .93 


Estimate 

SB 

t 

Estimate 

SE 

t 

FI 

.23 

.059 

3.9 

.44 

.036 

12 

EN 

-.18 

.060 

-2.9 

-.22 

.037 

-6 

CV 

.88 

.061 

14.4 

.95 

.038 

25 


Note. Political exclusion (PO) is regressed on foreign investment (FI), energy 
development (EN), and civil liberties (CV). Tlie first three columns show results 
for the observed correlation matrix (Table VI). The last three columns show tvhat 
happens when r(PO, FI) is set to 0. Coefficients in Table VII are standardized, that 
is, computed from variables standardized to have mean 0 and variance 1. The 
coefficients reported by SGS [62, p. 249] are not standardized and therefore do not 
match the correlation matrix. 
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'ABLE VI 

J Williams Correlation Matrix 
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SGS, let alone justified. (Of course, the assumptions behind the statistics 
in Timberlake and Williams might seem equally antic.) 

However, for the sake of argument, let us grant SGS [62] their assump¬ 
tions. On that basis, the standard error for the correlation in question is 
about .12. I change the suspect correlation coefficient from its 

observed value of —0.175 to the new value of 0, a difference of about 1.5 
SEs. I then recompute the model (last three columns in Table VII). The 
results are even better for Timberlake and Williams: the estimated coeffi¬ 
cients are bigger and more significant; the signs stay the same; and 
moves closer to 1.^ 

I will not defend the model any further. Measurement problems are 
extreme, and the list of omitted variables very long. SGS may well be right, 
that cause and effect have been confused. But the demonstration is 
peculiar. The correlation matrix cannot show that FI, EN and CV cause 
PO—the fatal flaw in the Timberlake-Williams model. (Of course, 
Timberlake and Williams are not alone in this respect.) Nor can the matrix 
show that FI, EN and CV do not cause PO—the corresponding flaw in 
SGS. Indeed, it is trivial to construct four variables labelled FI, EN, CV 
and PO, such that FI, EN and CW do cause PO; but sample correlation 
matrices will look rather like the one in Table VI. This only sharpens the 
basic question. What do any of these calculations tell us about the world 
outside the computer? 


11. SOME MATHEMATICAL ISSUES 

Sections 11 and 12 address by mathematical example two questions: 

(i) To what extent can correlational methods recover an underlying 
path diagram? 

(ii) When can the arrows in the diagram be interpreted as indicating 
causation, rather than conditional independence and dependence? 

The examples will indicate how SGS [62] use the “faithfulness” assump¬ 
tion to help them answer such questions. Issues of identifiabilily and 
consistency will be discussed, and methodological contributions in SGS wilt 
be delineated. Sections 11 and 12 are more technical than previous 
material; readers can skip to Section 13 without losing the thread of the 
argument. 

The focus is on linear models. Suppose you have a covariance matrix 
that describes certain variables. Assume these variables are jointly normal, 

The new matrix is still positive definite, so it is a legitimate correlation matrix. Section 
12.1 discusses the connection beween the Timberlake-Williams model tind the faithfulness 
assumption. Also see Cartwright [5, pp. 79-84]. 
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with mean 0; that avoids all questions of linearity etc. and all problems 
created by having only finite amounts of data. However, the statistical 
procedures 1 am considering—like the SGS algorithms—will operate on 
that covariance matrix and on nothing else. Such procedures may be called 
“correlational.” 

Path models were defined in Section 6. Briefly, you start with variables 
at level 0; variables at level k are linear combinations of variables at lower 
levels, plus independent random errors. In a path diagram, nodes repre¬ 
sent variables. There is an arrow from X to T if Z is used as an 
explanatory variable in the equation for Y. 

Exogeneity is a critical concept. As indicated before, the term is used in 
at least three senses. The weakest definition is purely mechanical: exoge¬ 
nous variables arc not explained within the model, but are supplied to the 
model. Variables at level 0 in a path model are exogenous in this minimal 
sense. A more restrictive definition: exogenous variables are statistically 
independent of the error terms in the equations. The third idea is the one 
that is relevant to causal inference: X is exogenous if selecting subjects 
with X = x gives the same results as intervening to set X = jc. 

There are tests for exogeneity in the literature, as well as model 
specification tests. However, these have limited relevance to causal infer¬ 
ence. For example, Hausman [26] assumes that certain variables are known 
a priori to be exogenous and then tests whether other variables are 
exogenous; he interprets exogeneity as orthogonality to disturbance terms. 
He also has a test that detects correlation between errors from equations 
in a path model. White [69, 70] focuses on similar issues—for instance, 
testing whether the variables have a jointly normal distribution. 

Another reference in the econometric literature is Engle, Hendry, and 
Richard [15]. These authors distinguish several kinds of exogeneity; “strict” 
exogeneity means independence of variables and error terms, but only 
“super” exogeneity permits estimating the effects of interventions. Exam¬ 
ples are given to illustrate the definitions [15, pp. 287-294]. There is 
further discussion in Learner [35]. 


11.1, The Basic Statistical Problem 

Suppose you have n random variables with a jointly normal distribution; 
all the variables have mean 0, and you know the covariance matrix, which 
is positive definite. You wish to present this covariance matrix as a path 
model. In a sense, nothing is easier. Simply order the variables, arbitrarily, 
as X[, Xj,..., X„. By successively applying regression, we can find coeffi¬ 
cients fly and error terms such that X,, e,.e„ are all independent 
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with mean 0, and Eq. (10) holds: 

X 2 = ^2 

~ ‘* 31-^1 * 32 -^ 

+ ■■■ 

Then X, is presented as exogenous a 
X 2 “cause” Xji and so forth. In shor 
covariance matrix as a path diagram; 
inference.^ 

11.2. The Faithfulness Assumption 
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founding, as discussed in Section 12.1 
is faithful to a diagram provided coi 
dencies are determined by the pro 
diagram, rather than specific numerk 
By way of example. Fig. 7 shows 
causes IF through the intervening var 
of causality is reversed.'*’ The lower 
“path coefficients,” that is, standard!/ 
SGS [62] distinguish between the tw 
seems to be as follows: 

In the left hand diagram, Y and Z arc 
the right, hewever. Y and Z arc conditi 

Another contrast: 

In the teft-iiund diagram, Y and Z arc i 
right, IiDwcvcr, Y and Z are conditiona 

For the construction in (10), simply chuosi 
so El.V-lA'i, A'j) = and so li 

ordering of the variables in CIO) is artiiUary, I 
cannot determine which variables are causes 1 
exogenous iu the .sense tliat it is statistically ii 
docs not suffice to estimate the results of niai 
is a cause or iin effect. 

In this section, t use "cause” in its ortlin 
technical point—.iboiit the jHtssihility of cstiii 
—still holds if file arrows arc interpreted as i 
then colorful shortiiand (perhaps too colorful 


PM3006509560 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 





^ FREEDMAN 


REGRESSION 


87 


stions of linearity etc. and all problems 
lounts of data. However, the statistical 
:e the SGS algorithms—will operate on 
hing else. Such procedures may be called 

action 6, Briefly, you start with variables 
linear combinations of variables at lower 
errors. In a path diagram, nodes repre- 
w from A' to y if AT is used as an 
on for Y. 

As indicated before, the term is used in 
t definition is purely mechanical; exoge- 
/ithin the model, but are supplied to the 
ith model arc exogenous in this minimal 
on: exogenous variables are statistically 
tile equations. The third idea is the one 
x: X is exogenous if selecting subjects 
as intervening to set X = x- 
y in the literature, as well as model 
; have limited relevance to causal infer- 
issumcs that certain variables are known 
hen tests whether other variables are 
ty as orthogonality to disturbance terms, 
rrelation between errors from equations 
focuses on similar issues—for instance, 
a jointly normal distribution, 
imetric literature is Engle, Hendry, and 
.uish several kinds of exogeneity; “strict” 
of variables and error terms, but only 
ting the effects of interventions. Exam- 
Jefinitions [15, pp. 287-294]. There is 


iables with a jointly normal distribution; 
you know the covariance matrix, which 
I resent this covariance matrix as a path 
r. Simply order the variables, arbitrarily, 
applying regression, we can find coeffi- 
, that X^, € 2 ,..., e„ are alt independent 




I 

I 

I 

I 

t 


i 

I 

I 

1 

1 


with mean 0, and Eq. (10) holds: 

^2 = ‘*21-^1 + 62 

4 “ (1^2X2 E 


+ -( 10 ) 

Then X^ is presented as exogenous and the “cause” of X 2 ; next, X^ and 
X 2 “cause” X^; and so forth. In short, there are many ways to present a 
covariance matrix as a path diagram; few if any will be relevant for causal 
inference.^ 

11.2. The Faithfulness Assumption 

How can you single out one path diagram from the many that corre¬ 
spond to a given covariance matrbi? At this point, SGS [62] seem to use the 
“faithfulness” assumption; this assumption is also used to handle con¬ 
founding, as discussed in Section 12.1 below. Basically, a covariance matrix 
is faithful to a diagram provided conditional dependencies and indepen¬ 
dencies are determined by the presence or absence of arrows in the 
diagram, rather than specific numerical values of parameters. 

By way of example. Fig. 7 shows two path diagrams. On the left, X 
causes W through the intervening variables Y and Z; on the right, the flow 
of causality is reversed.^^ The lower case letters on the arrows stand for 
“path coefficients,” that is, standardized regression coefficients. How could 
SGS [62] distinguish between the two theories in the figure? Their idea 
seems to be as follows: 

In the left hand diagram, Y and Z are conditionally independent given X; on 
the right, however, V and Z are condifionaliy dependent given X. 

Another contrast; 

In the left-hand diagram, V and Z are conditionally dependent given W-, on the 
right, however, Y and Z are conditionally independent given W. 


For the construction in (10), simply choose so ElX 2 \X^} = a 2 ^X^; choose n,| and 
so E(^ 3 |A’i,A 2 l + Oil ^21 ^"4 SO forth. For details, see the Appendix. Since the 

ordering of the variables in (10) is arbitrary, fitting such equations or drawing path diagrams 
cannot determine which variables are causes and which are effects. In particular, may be 
exogenous in the sense that it is statistically independent of disturbance terms; that by itself 
does not suffice to estimate the results of manipulating X^, since we cannot tell whether 
is a cause or an effect. 

In this section, I use “cause” in its ordinary (perhaps undefinable) sense. However, the 
technical point—about the possibility of estimating path diagrams from covariance matrices 
—still holds if the arrows are interpreted as merely representing associtttion, “Causation” is 
then colorful shorthand (perhaps too colorful) for a certain kind of covariation. 
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computed from the four parameters a. 


Fia. 7. If two path diagrams have the same covariance matrix, correlational methods 
cannot tell them apart; the faithfulness assumption is made to rule out such problems. The 
lower case letters on the arrows denote ‘‘path coefficients,” that is, standardized regression 
coefficients. 


Therefore, the pattern of conditional dependence and independence iden¬ 
tifies the diagram. (In both diagrams, X and W are conditionally indepen¬ 
dent given Y and Z.) 

This idea works for many path diagrams, but fails for others. Indeed, the 
path coefficients can be chosen so the pattern of conditional dependence 
and independence is the same in the two diagrams. Even worse, both 
diagrams can give rise to the same covariance matrix—so correlational 
methods cannot tell which is right. SGS [62] make the “faithfulness 
assumption” in order to rule out such indeterminacies. (The workings of 
the assumption will be explained below.) 

However, that only moves the difficulty to another place. Faithfulness is 
hardly an empirical fact; it is an assumption about unobservables, made to 
rule out situations that cannot be handled by correctional methods. The 
SGS analytical program can now be stated rather simply. If the arrows in a 
path diagram represent causation not association, and if the path diagram 
can be estimated from data, then SGS can indeed infer causation from 
association. 

The balance of Section 11.2 provides technical backup; readers can skip 
to Section 11.3. The left-hand panel in Fig. 7 is described by 

Y=aX+Sj, Z^bX+S 2 , W=cY+dZ+8i. (11) 

In this equation, X, 5,, 5,, 5, are independent and normal, with mean 0; 
X, Y, Z, W ail have variance 1. The covariance matrix of X, Y, Z, W can be 
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It is a little theorem, which follows 1 
the Appendix, that 
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Rearranging (16) gives the quadratic e 
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One solution to (17) is 
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computed from the four parameters b, c, d as shown in (12): 



X 

Y 

Z 

W 

X 

1 

a 

b 

ac + bd 

Y 
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ab 

c + abd 

Z 

b 

ab 

1 

d + abc 

w 

ac 4- bd 

c + abd 

d + abc 
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r 1 
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It is a little theorem, which follows by a tedious calculation from (48) in 
the Appendix, that 

cov{X,W\Y,Z) = 0. (13) 

This is an example of a conditional independence relation forced by a 
graph; (13) holds whatever the path coefficients in Fig. 7 may be. 

The diagram on the left in Fig. 7 is reversible, provided 

cov(r, Z[IF) --0. (14) 

By (48) below, Eq. (14) is equivalent to 

cov(F,Z) = cov(y, JF) X cov(Z,W). (15) 

By (12), this means 

ab = (c + abd){d + abc). (16) 

Rearranging (16) gives the quadratic equation 

cd(ab)~ — (1 — c^ — d^)ab + cd = 0. (17) 

One solution to (17) is 

1 - c- - d- - ^/(l - ~ d-y - Ac-d- 


I chose a, c, d more or less at random, getting 0,1925, 0.2873, and 
0.1245, respectively."’ I computed b from (18), getting 0.2063. This choice 
forces the conditional independence relation (14) and violates the faithful¬ 
ness assumption; conditional independence comes from the parameter 
values, not the presence or absence of arrows. 

There was a bit of luck, here, because .some values for a, c, d will not produce correlation 
matrices. 
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Given the values for the four parameters a, b, c, d, the covariance 
matrix (12) can be evaluated as 


'1.0000 0.1925 

0.1925 1.0000 

0.2063 0.0397 

,0.0810 0.2922 


0.2063 0.0810’ 

0.0397 0.2922 

1.0000 0.1359 

0.1359 1.0000, 


The path coefficients in the right-hand panel of Fig. 7 are easily 
computed from (19): 

the path coefficient from IT to T is c' = covlT, W) = 0.2922; 
the path coefficient from W to Z is d' = cov(Z, W) = 0,1359; 
the path coefficients from Y and Z to X are obtained by multiple 
regression, as o' = 0.1846 and b' = 0.1990. 

With these choices, faithfulness does not hold and (19) can be represented 
by either diagram in Fig. 7. (For details on multiple regression, see the 
Appendix.) In effect, the faithfulness assumption precludes certain alge¬ 
braic identities among the parameters, like (16). Since parameters are not 
observable, the faithfulness assumption is not subject to direct empirical 
tests based on finite amounts of data. 

11.3. Complete Graphs 

Even if the covariance matrix is faithful to a graph, however, problems 
of indeterminacy remain—particularly if the graph is “complete” in the 
sense that every pair of vertices is joined by an arrow. Figure 8 illustrates 
this indeterminacy. The same covariance matrix (20) for the variables 


Fig. 8. Graphs (a) and (b) have the same covariance matrix. Both are complete; there is 
an arrow from every variable to every other variable. The numbers on the arrows are path 
coefficients, that i.s, standardized regression coefficients. 
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parameters a, b, C, d, the covariance 
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X,Y,Z is represented either by the diagram in panel (a) or the one in 
panel (b), where the flow of “causality” is reversed; 


1 

X 

y 
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1 

.46 

.50 

y 

,46 
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.42 

^ , 

.50 

.42 
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( 20 ) 


For a second example of indeterminacy when the graph is complete, 
consider four variables X, Y, Z, W with covariance matrix 2 given by 


(1 


2 = 


i 

4 

3 

4 
3 



( 21 ) 


Figure 9 shows two complete path diagrams, both of which arc compati¬ 
ble with the given covariance matrbe. In the left-hand panel, X is exoge¬ 
nous and “causes” Y; then X and F “cause” Z; finally, X, Y, Z “cause” W. 
In panel (b), the flow of “causality” is reversed. The equations correspond¬ 
ing to the left-hand panel are given as (22); panel (b) is described in (23); 

F= ix+ S, 

Z^^x+ ^Y+ §2 

53 ; (22) 

Z= |JF+ Cj 

fz -F fiF-l- e, 

X= ^Y+ iZ +iW+e,. (23) 


The covariance matrix 2 is also compatible with the factor analysis 
model (24), where the unobservable exogenous variable U causes all four 
observables (right-hand panel of Fig. 9): 

X=U+^i, y=(/+^2, Z=U+^i, IF= t/-l-^4- (24) 

In each system of Eqs. {22)-(24), the error terms are assumed to be 
independent and normally distributed with mean 0; error terms arc inde¬ 
pendent of the exogenous variable. As a technical matter, the covariance 
matrix (20) is faithfully represented by both graphs in Fig. 8. Likewise, the 
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Fig. 9. Two complete path diagrams and a factcw analysis model, all having die same 
covariance matrix. 


covariance matrix (21) is faithful to Fig. 9(a) and to 9(b). Proofs may be 
based on (48) below. 

To sum up, if a covariance matrix is faithful to a complete graph (with 
all pairs of vertices joined by arrows), it is faithful to many such graphs. 
Then correlational methods cannot tell the causes from the effects. SOS 
[62] techniques work best when the graph is sparse; that is, relatively few 
pairs of vertices are joined by arrows (Section 6). 

11.4, Identifiability and Consistency 

The focus continues to be on linear models. In statistical terminology, 
models are “identifiable” when they make different predictions about 
observables. For example, suppose you have two models for your data. If, 
for all data sets. 


P(data|model 1) = P(datalmodel 2), 

there is an obvious problem—the data cannot distinguish between the 
models. If a path model is complete, or the faithfulness assumption is not 
imposed, then the graph underlying a covariance matrix is not identifiable; 
that is, the message of Sections 11.1-11.3. By way of illustration, the 
models in Fig. 7 are identifiable only if faithfulness holds. 

Flowever, even if we assume that a covariance matrix is faithful to a 
graph that is not complete, there may be several Such graphs [62, p. 89]. 
For example, the following three graphs can generate the same covariance 
matrix: 

X-^Y^ Z, X^Y^Z, X^Y^Z. 

Thus, SGS do not seem to have succeeded in defining a class of graphs and 
covariance matrices for which identifiability holds [62, p. 194]. 
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In statistical terminology, estimators are “consistent,” provided that, as 
the sample gets larger and larger, these estimators come closer and closer 
to the population parameters. If the parameters arc not identifiable, 
however, consistency is problematic. 

SGS [62] seem to claim that their algorithms will find all the path 
diagrams compatible with a given covariance matrix. However, the theo¬ 
rems suggest that the algorithms will at best find one such graph. SGS also 
seem to claim that their algorithms are consistent. However, without an 
identifiability theory for linear models, they cannot really be talking about 
consistency. 

Statisticians do have the weaker notion of “Fisher consistency,” named 
after R. A. Fisher: when applied to data for the whole population, an 
estimator should reproduce the population parameters exactly. Theorems 
like 5.1 in SGS [62, p. 405] seem to demonstrate the analog of Fisher 
consistency, rather than anything stronger. Such theorems show that, given 
the population covariance matrix, the algorithms will produce one graph 
consistent with that matrix. 


11.5. Methodolo^cal Contributions 

There is a connection between the theoiy of “directed acyclic graphs” 
(DAGs) and the conditional independence of random variables. (See 
Darroch et al. [10], Kiiveri and Speed [34, 61], Pearl [43, 44], Verma and 
Pearl [49, 67], Geiger [22].) Much of this work is reviewed in SGS [62]. 
However, the mathematics of nonlinear causal diagrams seems to be 
irrelevant to the big question: how do we infer causation from association? 

Most the applications in SGS are linear, i.e., based on path models. The 
“nonlinear causal diagrams” turn out to be multinomial models for cate¬ 
gorical data; examples arc in [62, pp. 147-151]. The issues about causation 
are quite similar to those for linear models, although the technical details 
are different. 

This section will focus on path models. To describe the novelty in the 
SGS approach to estimation, suppose you have data from a path model 
and wish to estimate the model. Consider two cases: 

Case 1. You know the classification of variables as to level; that is, 
you know which variables are at level 0, which arc at level 1, and so forth. 

Case II. You do not know the classification of variables as to level. 

In Case I, SGS [62] have little to tell us about estimation; as to 
confounding, see Section 12.1. Some of their algorithms seem to be 
equivalent to regression; others may be less efficient. In Case II, SGS try 
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to estimate the classification of variables as well as the path coefficients. 
That is the methodological contribution. To estimate the classification, 
SGS must impose the faithfulness assumption (Section 11.2). It is disap¬ 
pointing that SGS do not pin do>yn the sense in which their algorithms are 
successful (Section 11.4). 


12. MORE EXAMPLES AND SOME THEORY 

Section 12.1 explains how the faithfulness assumption and conditional 
independence are supposed to eliminate confounding. Section 12.2 dis¬ 
cusses omitted variables. Sections 12.3-12.5 revisit two examples from a 
more mathematical perspective; the idea is to show the limits of correla¬ 
tional methods. 

12.1. Faithfulness, Conditional Independence, and Confounding 

The problems created by unobservable variables are well known. As 
indicated above, SGS [62] handle such problems by imposing the faithful¬ 
ness assumption. More specifically, the assumption is used to rule out 
confounding. If confounding can be eliminated, the goal is in sight— 
association may soon be converted into causation. This section, which is 
based on work by Jamie Robins (personal communication), examines the 
logic in more detail. Adso see Pearl and Verma [49]. 

With some models, exact conditional independence forces a choice: 

• either there is no confounding by unmeasured common causes, 

• or the faithfulness assumption is violated. 

Near-independence is not good enough; associations may then be entirely 
spurious. Thus, causal inferences made by the SGS technique need exact 
conditional independence as well as the faithfulness assumption. 

This use of the faithfulness assumption has some theoretical interest. 
However, in order to base empirical work on such mathematical ideas, it 
would seem necessary to resolve the following questions, which SGS have 
not addressed: 

• Can the basic models be validated? 

• Can exact conditional independence be demonstrated? 

• Given exact independence, why is exact cancellation of confounded 
effects overwhelmingly less likely than the total absence of Such effects? 

As a practical matter, exact independence seems quite unusual. How¬ 
ever, the theory is worth understanding, and an example will make the 
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Fig. 10. The faithfulness assumption, conditional independence, and confounding. Vari¬ 
ables X,Y,Z are observable; U is unobservable. Arrows represent causation, not just 
association. Tlte lower-case letters on the arrows denote path coefficients. If a path coeffi¬ 
cient vanishes, the corresponding arrow muse be deleted. 


position clearer. Figure 10 shows a relatively simple diagram where faith¬ 
fulness and conditional independence would eliminate confounding. The 
arrows denote causation, not mere association. Variables X, Y, Z are 
observable; U is unobservable. Such unobservables are also called “con- 
founders” or “unmeasured common causes.” The joint distribution is 
normal, and variables are standardized to have mean 0 and variance 1. The 
covariance matrix for all four variables is shown in (25).^® 
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Of course, only the covariance matrix (26) of the observables {X,Y,Z) 
can be estimated from the data. In particular, de is determined from the 
observables, as cov(Y, Y): 



X 

Y Z 

X 

1 


Y 

de 

1 

Z 

a + bde -1- fd 

b + ade + fe I 


(26) 


It may help to review the idea of faithfulness, in the context of our 
example. Faithfulness is an assumption about unobservables; more specifi- 


’* Covariance matrices are symmetric; oniy the lower triangular part is .shown. Entries are 
assumed to be positive but less than 1, The matrix is assumed to be positive definite. 
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cally, it is a constraint on the relationship between the full covariance 
matrix (25) and the graph in Fig. 10. The assumption amounts to this: 
independence relationships (conditional and unconditional) are deter¬ 
mined by the presence or absence of arrows in the diagram, not specific 
parameter values. 

In particular, if the covariance matrix (25) is faithful to the diagram in 
Fig. 10, you cannot set any of the path coefficients to 0, except by deleting 
the corresponding arrow. An arrow from X to Z, say, entails that X has 
some causal effect on Z, no matter how small that effect may turn out to 
be. 

I return to more conventional issues. In our example, the parameter of 
interest is b, the causal effect of Y on Z. Due to the unmeasured 
confounder U, a regression of Z on X and Y produces a biased estimate 
of b. By a slightly tedious calculation, the coefficient of Y in the regres¬ 
sion equation is 

b + fe{\ - d-)/(l -- d~e^). (27) 

(For details on multiple regression, see the Appendix.) The bias in the 
regression estimate is the second term in (27). From a slightly different 
perspective, cov(y, Z) in (26) measures the total association between Y 
and Z. Part of this association is real: b measures the causal effect of Y 
on Z. Alas, part of the association is spurious: ade + fe represents the 
effects of the confoimder U. 

The goal is to separate the real part of the association from the spurious 
part. The familiar obstacle is that we have only (26), not (25). And (26) 
does not suffice to separate b + ade + fe into its components. But, SGS 
might say, suppose that X and Z are conditionally independent given Y: 

cov(X,Z|Y) =0. (28) 

By (48) below, this means 

cov(X,Z) = cov(X, Y) X cov(Y,Z). (29) 

A bit of algebra based on (25) shows that (29) is equivalent to 

a{l - dV-) + df = de^-f. (30) 

Although de is known and 0 < de < 1, there are many possible ways to 
solve Eq. (30). At this point, SGS would invoke the faithfulness assump¬ 
tion, concluding that 

a = 0, /-O. (31) 

The implication is that we have to remove the arrow from X to Z, as well 
as the arrow from U to Z. 
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Confounding has now been eliminated. On this basis, cov( Y, Z) = b, the 
whole of the association is real, and regression produces an unbiased 
estimate for the causal effect of Y on Z. At last, association has been 
converted into causation. (Of course, quite a lot of causality was built into 
Fig. 10 from the beginning—by assumption.) 

Those were the implications of exact conditional indepcntience. On the 
other hand, suppose we have approximate conditional independence: 
cov(2f, Z|Y) = .00001. Now the faithfulness assumption has no force. 
Given the covariances in (26), we can match them by suitable choice of the 
other parameters, even if n = fe = 0.^’ 

With approximate conditional independence, observed associations can 
be entirely spurious. Thus, even in the realm of mathematics, faithfulness 
and conditional independence preclude confounding only when the inde¬ 
pendence is exact. To make the contrast sharper, let us assume faithful- 


• If cov(W, Z|Y) = 0, then the association between Y and Z is purely 
causal; the effects of the unmeasured common cause U do not confound 
the relationship between Y and Z. 

• If cov(A'', Zj Y) = .00001, then confounding by unmea.sured common 
causes may account for all of the observed association between Y and Z. 

A similar problem must be considered when estimating path models 
from data (Section 11). Exact conditional independence, together with the 
faithfulness assumption, often permits us to identify the path diagram from 
the covariance matrix. However, approximate conditional independence is 
not enough; then, the covariance matrix will be faithful to a variety of 
complete graphs. 

A final example is the Timberlake-Williams model (Section 10). This 
model explains political exclusion (PO) in terms of foreign investment (FI), 
energy development (EN), and civil liberties (CV); the sample correlation 
matrix was shown in Table VI. Consider three scenarios for the "true” 
correlation matrix p: 

(i) Suppose p happens to equal the sample correlation matrix. 
Then, faithfulness obtains. 


This matching assumes, for in.stance, that any two of the variables have positive 
covariance, given the third. To avoid violating the faithfulness assumption, if you set a and b 
to 0, erase the corresponding arrows; if that is distasteful, set a and h to small but positive 
values. The SGS logic would apply to a wide variety of diagrams; however, an arrow from Y 
to X, no matter how smali the coefficient, spoils the show. 
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(ii) Suppose the true correlation p(PO,FI) between foreign invest¬ 
ment and political exclusion happens to vanish exactly. Then, the 
Timberlake—Williams model violates the faithfulness condition; presum¬ 
ably, that is SOS’s real complaint. 

(iii) If p(PO, FI) — .00001, faithfulness is restored. According to the 
SGS criteria, Timberlake and Williams are back in business. 

Within the framework of path models, scenario (ii) cannot be rejected at 
conventional significance levels; neither can (iii); and (i) represents our 
best estimate, subject to large uncertainties. SGS seize on hypothesis (ii), 
the only one that legitimates their critique. They are balking at shadows. 

12.2. Omitted Variables 

The problem of omitted variables was raised by Cliff Clogg at the Notre 
Dame conference, and this section paraphrases one of his points. There is 
a response variable Y, with explanatory variables X and Z; these may be 
construed as vectors. Suppose the data are generated according to the 
“true” model (32T): 


y = + 8 . 


(32T) 
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12.3. Ofi the Directzon of Causality 
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Theory l. 1 firs: generate X, Z, I 
i.s an unobservable error term. (If yoi 
Z, now is your moment.) Then 


Y=^X + 


The parameter vectors j3 and y are unknown and to be estimated from 
data by regression; it is /3 that is of primary interest. Subjects are assumed 
to be independent and identically distributed; (X, Z} and the error term e 
are independent and jointly normal; all variables have expected value 0. 
Consider, too, the “restricted” model (32R), where is defined so that 
E[YiX} = X/3r; the constituents of (32R) may be computed from the true 
model.^® 


Y^X^ + Zy+ €. 


(32R) 



According to Theory 1, X and Z c 

Theory 2, 1 first generate y as / 

change Y, now is yuur moment.) Ai 
arrow will delineate the flow of causa, 
us independent .V(0, -j) variable! 
according to 

iy + 
z = iy+ 


In principle, the variables X, Y, and Z are all observable; X and Z may 
be correlated. However, Investigators who do not know that Z is relevant 
may fit the restricted model R rather than the true model T. If so, the 
estimate of fi can be quite biased. In the vernacular, /3 r includes the 
effect of T on y through Z. The covariance matrix of lx,Y} cannot 
distinguish between the two models, because the matrix can be generated 

™ Indeed, — p + a, where a is obtained by the regression of Zy on X. In other terms, 
Zy = Xa + n, where -q is normal with mean 0, independent of X. Then S = e + tj. It may 
be seen that a depends linearly on y. 



i/= iy + 

In the second theory, Y causes X 
concerned—namely, the joint distril 
and 2 agree. Furthermore, the joint t 

See Clogg and Haritou [6], who make 
variables that ate eorreiated with e can aiso b 
bias can be just as troublesome as the mori 
problem cannot he solved by throwing varia 
oniitled variables was discussed in Section 12.! 
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by either model. Therefore, no statistical procedure based on that matrix 
can tell you whether the restricted model is right or wrong.^^ 

12.3. On the Direction of Causality 

This section uses “cause” in its ordinary (perhaps undefinable) meaning, 
not as shorthand for certain kinds of covariation. 1 return to Judea Pearl’s 
example, shown in Fig. 2(a). Given the covariance matrix for X, Y, and Z, 
the SGS [62] algorithm will produce the graph shown in panel (a). If you 
tell the algorithm that omitted variables are a possibility, it will tell you 
that Y cannot cause X or Z. 

In the example, X, Y, and Z are the only observables, and their 
covariance matrix is faithful to the graph in Fig. 2(a). I claim that such 
information cannot by itself determine the direction of the causal flow. To 
substantiate this claim, I now construct two theories. In both, the observ¬ 
ables X, Y, and Z will have the same covariance matrix, faithful to the 
graph in Fig. 2(a). However, the direction of the causal flow will be 
different in the two theories. 

Theory 1. I first generate X, Z, U as independent iV(0,1) variables; U 
is an unobservable error term. (If you want to intervene and change X or 
Z, now is your moment.) Then 

Y^X + Z + U. (33) 

According to Theory 1, X and Z cause Y, as suggested by Fig. 2(a). 

Theory 2. I first generate Y as iV(0,3). (If you want to intervene and 
change Y, now is your moment.) After a suitable pause, so that time’s 
arrow will delineate the flow of causality, I generate the errors K,, V^, and 
Fj as independent iV(0, |) variables and then produce X, Z, and U, 
according to 

^ = jZ + - Fj 

Z = jY+ V^- V 3 

17 = Fj - Fj. (34) 

In the second theory, Y cause.s X and Z. As far as the observables are 
concerned—namely, the joint distribution of X, Y, and Z—Theories 1 
and 2 agree. Furthermore, the joint distribution is faithful to the graph in 

See Clogg and Haritou [ 6 ], who make the following very interesting point. Adding 
variables that are correlated with e can also bias the estimate of /3; this “included variable” 
bias can be just as troublesome as the more familiar “omitted variable” bias; the latter 
problem cannot be solved by throwing variables into the model. The SGS treatment of 
omitted variables wag discussed in Section 12,1, 
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Fig. 2(a). But the direction of causality is determined neither by the data 
nor by the mathematics. With correlational methods, causality follows 
from the assumptions about the unobservables. 

12.3. The AFQT Problem 

SGS [62, p. 242] seem to claim that, as a demonstrable mathematical 
fact, their procedures will find the right answers; 

Assuming the right variables have been measured, there is a straightforward 
solution to these problems: apply the PC, FCI, or other reliable algorithm, and 
appropriate theorems from the preceding chapters, to determine which X 
variables influence the outcome y, which do not, and for which the question 
cannot be answered.... Tlien estimate the dependencies by whatever methods 
seem appropriate and apply the results of the previous chapter to obtain 
predictions of the effect of manipulating the X variables. No extra theory is 
required. We will give a number of llluscrations .... 

The first example given by SGS to illustrate this claim is AFQT (Section 
9 above). To demonstrate that SGS are exaggerating more than a little, I 
pose a sharp mathematical question with the essential features of the 
AFQT problem. Then, I show the question to be undecidable by correla¬ 
tional methods. (Of course, when applied to the real example, both SGS 
and ordinary least squares made the right guess.) 

To set up the question, assume that X and Y are random variables; X is 
a vector; Y is scalar. 

y is a linear combination of X's, with fixed weights. (35) 
The observables are Y and ,..., F,. (36) 

Some F’s are Ws; some F’s are ringers. (A “ringer” is a variable that does 
not enter into the linear combination for Y.) There are also unobservables, 
including the 2f’s that are not V’s. Assume too that 

The full joint distribution is multivariate normal, with mean 0. (37) 

You are given the covariance matrix for the observables, but not the full 
covariance matrix. The problem is to say which of the F’s are AT’s and 
which are ringers. I claim this problem is not solvable, because I can 
produce two different theories leading to different classifications of the 
F’s, but having the same joint distribution for the observables. 
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Theory 1. I use the covariance matrix for the seven observable sub- 
tests Fj = NO,..., F 7 = GS, together with the three unobservable sub¬ 
tests, CS, AS, and PC. (The subtests are listed in Table VIII, Section 12.5 
below.) The full distribution is defined to be jointly normal, and all 
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variables have mean 0. Let F = .5 X NO + AR + WK + PC, where NO, 
AR, and WK are observable but PC is unobservable. In this theory, 
Ki,K 2 ,F 3 are X’s, the remaining K’s are ringers. This theory happens to 
have been more or less correct, prior to 1989; see Eq. (42) in Section 12.5. 

Theory 2. Again, I use the covariance matrix for the seven observable 
subtests Vi = NO,- GS, together with the other three unobserv¬ 
able subtests CS, AS, PC. I create an auxiliary variable U, which is 
independent of the 10 subtests and has small variance. The distribution of 
these 11 variables is defined to be jointly normal, and all variables have 
mean 0. There are three additional unobservables, defined as 


Ti - .25(AR + NO) + .5FC + U, (38) 

T, = .25(WK + NO) + .5PC + U, (39) 

Tj = .75(AR + WK) - 2U. (40) 

Let 

+ ETj. (41) 


In Theory 2, TiyT^yT-^ are the unobservables; all the V’s are ringers. The 
auxiliary variables U, CS, AS, PC serve only to define the joint distribution. 

Theory 1 and Theory 2 provide the same joint distribution for the 
observables. Therefore, no statistical procedure based on the joint distri¬ 
bution—like the SGS algorithms or any other correlational methods—can 
adjudicate between the two theories. 

This section and the previous one demonstrate the obvious: you cannot 
infer cause and effect relationships by doing arithmetic on a correlation 
matrix, because association is not causation. The mathematical develop¬ 
ment in SGS avoids such problems only by imposing more or less arbitrary 
conditions (like faithfulness) on unobservable variables, as discussed in 
Sections 11.2 and 12.1. 

In the present section, neither Theory 1 nor Theory 2 fits into the SGS 
framework; F is a deterministic function of the explanatory variables, with 
no stochastic error term: see Eq. (35). Furthermore, if 0 and PC arc 
treated as variables rather than error terms in (38)-(40), the joint distribu¬ 
tion in Theory 2 is, presumably, unfaithful to its causal graph. Similar 
comments apply to the previous section. 

12.5. Institutional Background on the AFQT 

The “Armed Services Vocational Aptitude Battery” (ASVAB) has 10 
subtests, including the seven listed in Table IV, Section 9 above. All 10 arc 
shown in Table VlIl. 
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TABLE vni 
The 10 Subtests in ASVAB 


I. 

Numerical Operations 

NO 

2, 

Word Knowledge 

WK 

3. 

Arithmetical Reasoning 

AR 

4. 

Mathematical Knowledge 

MK 

5. 

Electronics Information 

El 

6. 

Mechanical Comprehension 

MC 

7, 

General Science 

GS 

8. 

Coding Speed 

cs 

9. 

Auto & Shop Information 

AS 

10, Paragraph Comprehension 

PC 


Notes. The first seven were analyzed by 
SGS. ASVAB Fonn 17, July 1990. 


Until January, 1989 the AFQT was computed as 

AFQT = .5 X NO + AR + WK + PC. (42) 

After that date, NO was replaced by MK; a “verbal” score VE was defined 
as VE = WK + PC; and terms were standardized to have mean 0 and 
variance 1 on some calibration data—the “NORC 1985 sample.” AFQT 
was redefined as 

AFQT = MKz + ARz + 2 x VE^, (43) 

where the subscript Z denote.s standardization. Throughout the period, 
raw scores were by Congressional requirement converted to percentiles 
based on the NORC sample. Presumably, the data used by SGS [62] come 
from 1988 or before, since they pick up formula (42) rather than (43); see 
Section 9 above.^^ 


13. RESPONSES 

Formal statistical inference is, by its nature, conditional. If assumptions 
A, B, C,... hold, then H can be tested against the data. However, if 
A, B, C,... remain in doubt, so must inferences about H. Indeed, the 
statistical calculations may prove to be quite misleading. 

Many assumptions are made but only a few are tested. Those made 
without testing are called “maintained hypotheses.” They are usually 

SGS [G2] appear to be considering raw scores, and I follow suit. The material in this 
section was reported by Larry Hanser (personal comuiunication); he refers to Welsh et at [68, 
p. S, Table 3] and Eitelberg [14, p. 73]. 
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statistical and often rather technical—linearity, independence, exogeneity, 
etc. Careful scrutiny of such assumptions would therefore seem to be a 
critical part of empirical work. 

In the social sciences, however, statistical assumptions are rarely made 
explicit, let alone validated. Questions provoke reactions that cover the 
gamut from indignation to obscurantism. We know all that. Nothing is 
perfect Linearity has to be a good first approximation. The assumptions are 
reasonable. The assumptions do not matter. The assumptions are conservative. 
You cannot prove the assumptions are wrong. The biases will cancel. We can 
model the biases. We are only doing what everybody else does. Now we use 
more sophisticated techniques. What would you do? The decision-maker has to 
be better off with us than without us. We all have mental models; not using a 
model is still a model. 

With the SGS approach, responses are more subtle but no more empiri¬ 
cal. Proponents often seem to take a Bayesian stance; faithfulness is 
justified on the grounds that the exceptional cases have measure 0 and 
must therefore be viewed as negligible a priori.^^ However, the SGS 
approach is frequentist not Bayesian; the simulations, being done on 
finite-state computers, must concentrate in a set of iheasure 0; and the 
SGS class of models has measure 0 within larger classes of models. Indeed, 
from my perspective, the whole class of path models seems rather unlikely 
—given the intensity of the research effort and the paucity of convincing 
examples. The assumptions that diagrams are sparse and faithful stretch 
credibility even further. 

Attempts have also been made to justify the faithfulness assumption by 
appeals to continuity. If a covariance matrix is unfaithful, small changes to 
parameter values make it faithful. However, the same argument can be 
turned against correlational methods. For example, if a covariance matrix 
is faithful to an incomplete graph, small changes to hidden parameters 
make the graph complete and vitiate the SGS search procedures. Section 
12.1 points to another kind of instability in the SGS framework. The 
continuity defense (like the Bayesian argument) reflects an aesthetic 
judgment about modeling styles. Taste is no substitute for empirical 
verification. 

The SGS criteria for causality may also be defended as follows—it is 
unlikely that anything could produce the patterns of intcrcorrclation 
identified by SGS, other than causation; thus, correlational methods shift 


The “measure” here is the uniform distribution in Euclidean space, c.g., length, area, 
volume.... The SGS argument [62, p. 95j seems to be a variation on Laplace's “principle of 
insufficient reason": see Stigler [6i, p. 127]. 
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the burden of argument. Figures 5 and 6 should dispose of this idea. In 
real examples, the patterns identified by the SGS search algorithms can 
hardly represent cause-and-effect relationships. The burden would seem to 
be on the modelers: how can they recommend an algorithm that gives such 
results? 

Proponents of modeling can also be heard to argue that all of us make 
assumptions about unobservables. However, what is unobservable with one 
design may become observable with another. And some investigators still 
deal with unobservables the hard way—by doing the right studies. For 
example, take Fisher’s “constitutional hypothesis”: there may be a genetic 
factor that predisposes you to smoke and to get lung cancer, heart disease, 
etc.^'* This putative genetic factor is the unobservable common cause for 
smoking and illness. 

The epidemiologists did not deal with the constitutional hypothesis by 
introducing special assumptions. Instead, they studied the matter empiri¬ 
cally, using data from twin studies. For a recent report on the Swedish twin 
registry, see Floderus et al. [16]. On the Finnish twin registiy, see Kaprio 
and Koskenvuo [31]. Data on the Danish twin registry are fragmentary. 
There are forthcoming data on the U.S. twin registry, which are quite 
strong [72]. The numbers on lung cancer are suggestive, but still small—this 
is a rare disease, even among smokers. The data on heart disease and total 
mortality, however, make the constitutional hypothesis untenable. 

13.1. A Comment from Judea Pearl 

Judea Pearl (personal communication) writes that 

Correlation-based model-searching schemes produce causal inferences with 
only limited guarantees. Yet such schemes have potential, if conducted under 
conditions that screen out accidental independencies while maintaining struc¬ 
tural independencies—for example, longitudinal studies under slightly varying 
conditions. This assumes, of course, that under such varying conditions the 
parameters of the model will be perturbed, while its structure remains stable. 
Maintaining such delicate balance under changing conditions may be hard in 
real-life studies. However, considering the alternative of resorting to controlled, 
randomized experiments, such longitudinal studies are still an exciting opportu¬ 
nity. 

Additionally, any investigator who is searching for a causal model knowing 
that the parameters might be tied together by some hidden equation, like (17) 
[Section 11.2], is wasting time (and public funds). Such a model, even if correct, 
is bound to be useless, because without the assumption of autonomy (i.e., that 
each parameter can be perturbed without altering the others), the model 
cannot predict the effect of interventions or other changes.... 

Also see Pearl [45]; Pearl and Wermuth [50]. 

“ See SOS [62, pp. 298-299], 
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14. OTHER LITERATURE 

There is an extensive literature on the evaluation of models, going back 
at least to the Keynes—Tinbergen exchange (Keynes [32, 33]; Tinbergen 
[ 66 ]). Also see [37, 38], For more recent discussions, with other citations to 
the literature, see Freedman [18, 19]. Many authors have tried to explain 
the basis for inferring causation by using regression. See, for example, [52, 
53], or [28, 29]. Of enthusiastic views on social-science modeling, there is 
no shortage; see, for instance, [60], or [2]. For recent discussions of causal 
modeling, see Humphreys and Freedman [73], Cox and Wermuth [ 8 ], or 
Pearl [74]. 


15. CONCLUSIONS 

SGS [63] have not succeeded in clarifying the circumstances under which 
causal inferences can be drawn from observed associations, nor have they 
invented a reliable engine for performing this feat. Their algorithms have 
some technical interest, but will make causal inferences only when causa¬ 
tion is assumed in the first place. To be more explicit: If we assume that 
the arrows in a path diagram represent causation rather than assoeiation, 
and we also assume that the path diagram can be estimated from data, 
then indeed SGS can infer causation from association. The faithfulness 
assumption and exact conditional independence will together eliminate 
certain kinds of confounding. Even so, causality Is assumed into the picture 
at the beginning, not proved at the end. As Nancy Cartwright says, “No 
causes in, no causes out.”^^ 

The larger problem remains. Can quantitative social scientists infer 
causality by applying statistical technology to correlation matrices? That is 
not a mathematical question, because the answer turns on the way the 
world is put together. As I read the record, correlational methods have not 
delivered the goods. We need to work on measurement, design, theory. 
Fancier statistics are not likely to help much. 


APPENDIX; REGRESSION AND CONDITIONING 

For ease of reference, this appendix presents the usual formulas for 
computing regressions and conditional covariances. 1 begin with regres¬ 
sion. Suppose f and 17 are random variables; f may be a row vector. We 
seek the column vector p of regression coefficients for 17 on f. Let 

Cartwright [5, Chaps. 2. 3], Also see Pearl and Verraa [491. 
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C = E{^'|} and Z) = E{^'i 7 }; the prime denotes matrix transposition. 
Assume C is positive definite. Then 

j 8 = C-^D. (44) 

Now 77 = fyS + «, where u is automatically orthogonal to f. The mean 
square of u may be computed as follows: 

E(n^) = E(t,2) -/3'C/3. (45) 

If f and Tj have mean 0, then C = cov(^) and D = cov(^, rj); also, 
E(h) = 0. Likewise, if some component of f is a nonzero constant, 
E(«) = 0. If now the variables are jointly normal, u is independent of 
I turn to estimation. Recall Eq. (2), repeated here for ease of reference. 

Y = Xli + €. ( 2 ) 

In this equation, X is the “design matrix,” representing the explanatory 
variables. There is one row for each unit in the study, and one column for 
each variable. The entry in the ith row and )th column represents the ;th 
variable, as observed on the fth unit in the study. X may include a column 
of ones if there is to be an intercept in the equation. F is a column vector 
representing the dependent variable, whose fth component represents the 
value of Y for the /th unit in the study, e is also a column vector, with one 
component for each unit in the study, representing the impact on Y of 
chance factors unrelated to X. Typically, there will be many fewer parame¬ 
ters than data points, so j3 has relatively few components. 

The ordinary least squares estimator for j3 is denoted by a hat and may 
be computed as 

(X’Xy^X’Y. (46) 

The covariance matrix for /§, conditional on the design matrix, is computed 
as 

cov(/§|A') = (X'XyyaT(e,\X). (47) 

Of course, (46) is related to (44); this is seen by defining (f, tj) as a row 
chosen at random from (X,YX 

The “predicted values” and “residuals” are defined as F = X^ and 
e = Y — Y. The residuals are automatically orthogonal to X. The residual 
sum of squares, minimized by the choice of is RSS = l|e||^ = Then 
varfejA') in (47) may be estimated as RSS/(n — p), where n is the 
number of data points and p is the number of explanatory variables. 
Variances will be found along the diagonal of the covariance matrix, and 
the standard error is computed as the square root of the variance. In 
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deriving these formulas, it is assumed that, given X, the components of e 
are conditionally independent and identically distributed, with mean 0 . 
Suppose the model has an intercept. Then may be defined as 
= var{y}/var{y), where, e.g., 

var{y} = i E {Y, - y)^ ? = E 

If ail variables have mean 0, then R' = p'X'Xp/in X var{F}). 

The usual formula for computing conditional covariances may be pre¬ 
sented as follows. Let n > 2. Suppose Xj, X 2 ,...,X„ are jointly normal. 
We seek the conditional covariance of X, and X 2 , given X 3 , X 4 ,..., X„. 
Let 2 be the covariance matrix of X 3 , X 4 , Let ir, be the covari¬ 

ance of X, with X 3 , X 4 ,..., X„; let K 2 be the covariance of X, with 
X-, X 4 ,..., X„. We view atj and K 2 as (n — 2) X 1 column vectors. The 
conditional covariance is given by 

cov(Xi, X2IX3,..., XJ = cov(X4, X2) - k/X- 'k2. ( 48 ) 

The prime denotes matrix transposition. Details on the material in this 
appendix may be found in standard texts, for instance, [54]. 


ACKNOWLEDGMENTS 


Many useful comments were made by Dick Berk, John Cairns, Cliff Clogg, Mark Hansen, 
Larry Hanser, Jerome Horowitz, Paul Humphreys, Ron Lee, Tony Lin, Bill Mason, Vaughn 
McKim, Judea Pearl, Diana Petitti, Jamie Robins, Tom Rothenberg, Terry Speed, and Steve 
Turner, Amos Tversky’s work on the paper amounted to collaboration. A version of this 
paper will also appear in a volume of proceedings, edited by Vaughn McKim and Steve 
Turner, published by Notre Dame Press, 


REFERENCES 

1. L. M. Bartels, Instrumental and “quasi-instrumental” variables, Aimr. I. Pul. Sci. 35 
(1991). 777-SOO. 

2. L. M. Bartels and H. E. Brady, The state of quantitative political methodology, in 
“Political Science; The State of the Discipline 11” (Ada W. Finifter, Ed.), Amer. Pol. Sci. 
Assoc., Washington, DC. 1993. 

3. P. M. Blau and O. D. Duncan, “The American Occupational Structure," Wiley, New 
York, 1967. 

4. J. Cairns, “Cancer; Science and Society,” Freeman, San Francisco, 197!l. 

5. N. Cartwright, "Nature's Capacities and Their Measurement.” Clarendon Press. Oxford, 
1989. 


PM3006509581 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 






108 


DAVID FREEDMAN 


REGREf 


6. C. C. Clogg and A. Haritou, “The Regression Method of Causal Inference and a 
Dilemma with Tliis Method,’^ Technical report, Department of Sociology, Pennsylvania 
State University, 1994. 

7. J. Cornfield, W. Haenszel, E. C. Hammond, A. M. Lilienfeld, M. B. Shimkin, and E. L. 
Wynder, Smoking and lung cancer: Recent evidence and a discussion of some questions, 
/. NaL Cancer Inst. 22 (1959), 173-i03. 

8 . D. R. Cox and N. Wermuth, Linear dependencies represented by chain graphs, Statist 
Set 8 (1993), 204-2S3 (with discussion). 

9. R. Daggett and D. Freedman, Econometrics and the law; A case study in the proof of 
antitrust damages, in L. LeCam and R. Olshen, eds. “Proceedings, Berkeley Conference 
in Honor of Jerzy Neyman and Jack Kiefer,” Vol. I, pp. 126-175, Wadsworth, Belmont, 
CA, 1985. 

10. J. N. Darroch, S. L. Lauritzen, and T. P, Speed, Markov fields and log-linear interaction 
models for contingent^ tables, Ann. Statist. 8 (1980), 522—539. 

11. A. Desrosieres, “La politique des grands nombres,” Editions la Decouverte, Paris, 1993. 

12. O. D. Duncan, “Introduction to Structural Equation Models,” Academic Press, New 
York, 1975. 

13. A. S. C. Ehrenberg and J. A Bound, Predictability and Prediction, J. Roy. Statist. Soc, Ser. 
A 156, Part 2 (1993), 167-206. 

14. M. J. Eitelberg, “Manpower for Military Occupations,” Office of the Assistant Secretary 
of Defense (Force Management and Personnel), Wasliington, DC, 1988. 

15. R. F. Engle, D. F. Hendry, and J. F, Richard, Exogeneity, Econometrica 51 (1983), 
277-304. 

16. B. Floderus, R. Cederlof, and L. Friberg, Smoking and mortality: A 21-year follow-up 
based on the Swedish Twin Registry, Intemat. J. Epidemiology 17 (1988), 332-340. 

17. D. Freedman, A note on screening regression equations, Amer. Statist 37 (1983), 
152-155. 

18. D. Freedman, As others see us: A case study in path analysis, J. Educ. Statist. 12 No. 2 
(1987), (with discussion, whole issue). 

19. D. Freedman, Statistical models and shoe leather, in “Sociological Methodology 1991” (P. 
Marsden, Ed.), Amer. Socioi. Assoc., Washington, DC, 1991. 

20. D. Freedman and D. Lane, “Mathematical Methods in Statistics,” Norton, New York, 
1981. 

21. C. F. Gauss, “Theoria Motus Corporum Coelestium,” Perthes and Besser, Hamburg, 
1809; reprinted by Dover, New York, 1963. 

22. D. Geiger, “Graphoids: A Qualitative Framework for Probabilistic Inference,” Ph.D. 
dissertation, UCLA, Department of Computer Science, 1990. 

23. C. Glymour, A review of recent work on the foundations of causal inference, paper 
presented at the Notre Dame conference, 1993. 

24. C. Glymour, R. Scheines, P. Spirtes, and K. Kelly, “Discovering Causal Structure,” 
Academic Press, New York, 1987. 

25. M. Hakama, M. Lehtinen, P. Knekt, A. Aroraaa, P. Lcinikki, A. Miettinen, J. Paavonen, 
R. Peto, and L. Teppo, Serum antibodies and subsequent cervical neoplasms: A prospec¬ 
tive study with 12 years of follow-up, Amer. J. Epidemiology 137 (1993), 166-170. 

26. J. Hausman, Specification tests in econometrics. Econometrica 46 (1978), 1251-1271. 

27. S. L, Hofferth and K. A. Moore,. Early childbearing and later economic well-being, Amer. 
Soc. Rev. 44 (1979), 784-815. 

28. P. Holland, Statistics and causal inference, J. Amer. Statist. Assoc. 81 (1986), 945-960. 

29. P. Holland, Causa! inference, path analysis, and recursive structural equations models, in 
“Sociological Methodology 1988,” (C. Clogg, Ed.), pp. 449—484, Amer. Socioi. Assoc., 
Washington, DC, 1988. 


\ 

I 

I 

I 

I 

I 

» 


I 

1 

I 

I 

I 

I 

I 



30. International Agency for Research on Cam 
Evaluation of the Carcinogenic Risk of C 
France, 1986. 

31. J. Kaprio and M. Koskenvuo, Twins, smokii 
of smoking-discordant twin pairs. Social Sci. 

32. J. M. Keynes, Professor Tinbergen’s methoi 

33. J. M. Keynes, Comment on Tinbergen’s res 

34. H. Kiiveri and T. Speed, Structural analysis 
cal Methodology 1982," (S. Leinhardt, Ed.). 

35. E. E. Learner, Vector autoregiessiOns for c 
Regimes." (K. Brunner and A. Meltzer, i 
North-Holland, Amsterdam, 1985. 

36. A. M. Legendre, “Nouvelles methodes poi 
Courcier, Paris, 1805; reprinted by Dover, I 

37. T. C. Liu, Under-identification, structural 
(1960), 855-865. 

38. R. E. Lucas Jr., Econometric policy evahn 
Labor Markets,” (K. Brunner and A. Meltzs 
Public Policy, Vol. 1, pp- 19-64, with disco 
Eton., North-Holland, Amsterdam, 1976. 

39. G. S. Maddala, “Introduction to Economci 

40. C. F. Maiiski, identification problems in tin 
1993," (P. V. Marsden, Ed.), pp. 1-56, Blai 

41. P. Meelil, ••Clinical versus Statistical Predit 
the Evidence.” University of Minnesota Pr, 

42. K. A. Moore and S. L. Hofferth, Factors :il 
Ptipui. Enrinm. 3 {1980). 73-98. 

43. J. Pearl. Fusion, propagation and structuri 
241-288. 

44. J. Pearl. "Probabilistic Reasoning in Intclli.r 
C/V 108,8. 

45. J. Pearl, Comment: Graphical models, can 
26f>-273. 

46. J. Pearl, "On the .Statistical interpretation 
Computer Science Department, UCL/\, 19' 

47. J. Pearl. "On the Identification of Norn 
Report. Compuier Seienee lOepartment. I ' 

4.S. J. Fe.irl. D. (ieicer. tind T. Verma. llie 
Diagrams, itelief Nets tinti Decision AnaU 
67-87, Wiley, New York. 1989. 

49. .1. Pearl and f. Verma, A theory of inle 
Represcnlalion ,ind Reasoning: Proeeedo 
<J. A. .Mien. R. likes, tnui I;. Sandessall. 
Mateo. C.\. 1991. 

50. J. Pearl .ind ."S Wermuth, When can asso. 
••proeeciiings. I-’oiirth International tc’ork 
l')9.L" p|), 14 1 150; in ■'.Artilteial Intellu: 
Oliittjrti. I'ds.l. Springer-Verlag. Iteriiil. I"' 

51. R. I’eto and H. /ur H.iusen Cl ds.l, ••X’o.' 
(I.irbnt [.aboiaiory. H.mlniry Report No. 


PM3006509582 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 






REGRESSION 


109 


D FREEDMAN 


Regression Method of Causal Inference and a 
cal report. Department of Sociology, Pennsylvania 

imond, A. M. Lilienfeld, M. B. Shimkin, and E. L. 
ecent evidence and a discussion of some questions, 
3. 

dependencies represented by chain graphs, StatisL 
\). 

metrics and the law: A case study in the proof of 
Olshen, eds. “Proceedings, Berkeley Conference 
Kiefer.” Vol. I, pp. 126-175, Wadsworth. Belmont, 

P. Speed, Markov fields and log-linear interaction 
tatist. 8 (1980), 522-539. 

ids nonibres,” Editions la Decouverte, Paris, 1993. 
■jctural Equation Models,” Academic Press, New 

redictability and Prediction, J. Roy. Statist Soc, Set. 

iry Occupations,” Office of the Assistant Secretaiy 
Personnel), Washington, DC, 1988. 

F. Richard, Exogeneity, Rconometrica 51 (1983), 

>erg. Smoking and mortality: A 21-year follow-up 
'iiierjmt. J. Epidemiology 17 (1988), 332—340- 
regression equations, Amer. Statist. 37 (1983), 

.e study in path analysis, J- Ediic. Statist. 12 No. 2 

oe leather, in “Sociological Methodology 1991” (P. 
Vashington, DC, 1991. 

.atlcal Methods in Statistics,” Norton, New York, 

um Coelestium,” Perthes and Besser, Hamburg, 
963. 

Framework for Probabilistic Inference,” Ph.D. 
imputer Science, 1990. 

'< on the foundations of causal inference, paper 
cc, 1993, 

and KL Kelly, “Discovering Causal Structure,” 

. Aiomaa, P. Leinikki, A. Micttinen, J, Paavonen, 
cs and subsequent cervical neoplasms: A prospec- 
imer. J. Epidemiology 137 (1993), 166-170, 
ometrics, Economelrica 46 (1978), 1251-1271. 
.'hildbearing and later economic well-being, Amer. 

nee, J. Amer. Statist Assoc, 81 (1986), 945-960. 
ysis, and recursive structural equations models, in 
"ciogg, Ed.), pp. 449-4S4, Amer. Sociol. Assoc., 



30. International Agency for Research on Cancer, “Tobacco Smoking,” Monographs on the 
Evaluation of the Carcinogenic Risk of Chemicais to Humans, Vol. 38, lARC, Lyon, 
France, 1986. 

.31. J. Kaprio and M. Koskenvub, Twins, smoking and mortality: A 12-year prospective study 
of smoking-discordant twin pairs. Social Set. Med. 29 (1989), 1083—1089. 

32. J. M. Keynes, Professor Tinbergen’s method, Eeoii. J. 49 (1939), 558-570. 

33. J. M. Keynes, Comment on Tinbergen’s response, Econ. J. 50 (1940), 154-156. 

34. H. Kiiveri and T. Speed, Structural analysis of multivariate data; A review, in “Socioiogi- 
cal Methodology 19S2,” (S. Leinhardt, Ed,), Jossey Bass, San Francisco, 1982. 

35. E. E. Learner, Vector autoregressions for causal inference, in “Understanding Monctaiy 
Regimes," (K. Brunner and A. Meltzer, Eds.); supplement to the }. Monetary Econ., 
North-Holiand, Amsterdam, 1985. 

36. A. M. Legendre, “Nouvelles methodes pour la determination des orbites des cometes, 
Courcier, Paris, 1805; reprinted by Dover, New York, 1959. 

37. T. C. Liu, Under-identification, structural estimation, and forecasting, Econometrica 28 

(1960), SS5-865. 

38. R. E. Lucas Jr., Econometric policy evaluation: A critique, in “The Phillips Curve and 
Labor Markets,” (K. Brunner and A. Meltzer, Eds.), Camegie-Rochester Conferences on 
Public Policy, Vol. 1, pp. 19-64, with discussion, supplementary series to the 1. Monetary 
Econ., North-Holiand, Amsterdam, 1976. 

39. G. S. Maddala, “Introduction to Econometrics,'’ 2nd ed., McGraw-Hill, New York, 1992. 

40. C. F. Manski, Identification problems in the social sciences, in “Sociological Methodology 
1993,” (P. V. Matsden, Ed.), pp. 1-56, Blackwell, Oxford, 1993. 

41. P. Meehl, “Clinical versus Statistical Prediction; A Theoretical Analysis and a Review of 
the Evidence,” University of Minnesota Press, Minneapolis, 1954. 

42. K. A. Moore and S. L. Hofferth, Factors affecting early family formation; A path model, 
Popul. Ertviron. 3 (1980), 73—98. 

43. J. Pearl, Fusion, propagation and structuring in belief networks, Artif. Intell. 29 (1986), 
241-288. 

44. J, Peari, “Probabilistic Reasoning in Intelligent Systems,” Morgan Kaufmann, San Mateo, 
CA, 1988. 

45. J. Pearl, Comment: Graphical models, causality and intervention. Statist. Set. 8 (1993), 
266-273. 

46. J. Pearl, “On the Statistical Interpretation of Structural Equations,” Technical Report, 
Computer Science Department, UCLA, 1994a. 

47. J. Pearl, “On the Identification of Nonparametric Structural Equations,” Technical 
Report, Computer Science Department, UCLA, 1994b. 

48. J. Pearl, D. Geiger, and T, Verma, The logic of influence diagrams, in “Influence 
Diagrams, Belief Nets and Decision Analysis,” (R. M. Oliver and i. Q. Smith, Eds.), pp. 
67-87, Wiley, New York, 1989. 

49. J. Pearl and T. Verma, A theory of inferred causation, in “Principles of Knowledge 
Representation and Reasoning: Proceedings of the Second International Conference 
(J. A. Allen, R. Pikes, and E. Sandewall, Eds.), pp. 441-452, Morgan Kaufmann, San 
Mateo, CA, 1991. 

50. J. Peari and N. Wermuth, When can association graplis admit a causal explanation? in 
“Proceedings, Fourth International Workshop on Artificial Intelligence and Statistics, 
1993,” pp. 141-150; in “Artificial Intelligence and Statistics” (F. Cheeseman and W. 
Oidford, Eds.), Springer-Verlag, Berlin, 1994. 

51. R. Peto and H. zur Hausen (Eds.), “Viral Etiology of Cervical Cancer,” Cold Spring 
Harbor Laboratory, Banbury Report No. 21, 1986. 


PM3006509583 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 
















no 


DAVID FREEDMAN 


INFOHMATION F 


52. J. Pratt and R, Schlaifer, On the nature and discovery of structure, J. Amen Statist. Assoc. 
79 (1984), 9-21. 

53. J. Pratt and R. Schlaifer, On the interpretation and observation of laws, J. Econ. 39 
C1988), 23-52. 

54. C, R. Rao, “Linear Statistical Ittf^rence and Its Applications,” 2nd ed., Wiley, New York, 
1973. 

55. R. R. Rindfuss, L. Bumpass, and C. St. John, Education and fertility; Implications for the 
roles women occupy. Amen Social. Rev. 45 (1980), 431-447. 

56. R. R. Rindfuss, L. Bumpass, and C. St. John, Education and the timing of motherhood: 
Disentangling causation, J. Marriage Family 46 (1984), 981—984. 

57. E. Seneta, Discussion, J. Educ. Statist. 12 (1987), 198-201, 

58. K, J. Sherman, J. R. Daling, J. Chu, et al Genitai warts, other sexually transmitted 
diseases, and vulvar cancer, Epidemiology 2 (1991), 257-282. 

59. H. Simon, The meaning of causai ordering, in “Qualitative and Quantitative Social 
Research,” (R. K. Merton, J. S. Coleman, and F. H, Rossi, Eds.), pp. 65-81, Free Press, 
New York, 1980. 

60. N. J. Smelser and D. R. Getstein, “Behavioral and Social Science: Fifty Years of 
Discovery,” National Academy Press, Washington, DC, 1986. 

61. T, P. Speed and H. T. Kiiveri, Gaussian Markov distributions over finite graphs. Ann. 
SlatisL 14 (198S), 138-150. 

62. P. Spirtes, C. Glymour, and R. Scheines, “Causation, Prediction and Search,” Lecture 
Notes in Statistics, Vol. 81, Springer-Verlag, New York/Berlin. 1993. 

63. P. Spirtes, R. Scheines, C. Glymour, and C. Meek, “TITRAD II,” Documentation for 
Version 2.2, Technical Report, Department of Philosophy, Qjrnegie Mellon University, 
Pittsburgh, PA, 1993. 

64. S. Stigler, “The History of Statistics,” Harvard University Press, Boston, 1986. 

65. M. Timberlake and K. Williams, Dependence, political exclusion and government repres¬ 
sion: Some cross national evidence, Amer. Social. Ren. 49 (1984), 141-146. 

66. J. Tinbergen, “Reply to Keynes.” Econ. J. SO (1940), 141-154. 

67. T. Verma and J. Pearl, “Causal Networks; Semantics and Expressiveness.” in “Uncer¬ 
tainly in AI 4” (R. Shachter, T. S. Levitt, and L N. Kanal, Eds.), pp. 69-76, Elsevier 
Science, Amsterdam, 1990. 

68. J. R. Welsh, S. K. Kucinkas, and L. T. Curran, “Armed Services Vocational Battery 
(ASVAB): Integrative Review of Validity Studies,” Air Force Human Resources Labora¬ 
tory Report AFHRL-TR-90-22. 1990. 

69. H. White, A heteroskedasticity-consistent estimator and a direct test for heteroskedastic- 
ity, Ecanametrica 48 (1980), 817-838. 

70. H. White, Maximum likelihood estimation of tnisspecified models, Econometrica 50 
(1982), 1-25. 

71. G. U. Yule, An ruvestigation into the causes of changes in pauperism in England, chiefly 
during the last two intercensa! decades, /. Roy. Statist. Sac. 62 (1989), 249-295. 

72. D. Carmelli and W. F. Page, Twenty-four year mortahty in World War II US male 
veteran twins discordant for cigarette smoking. International Journal of Epidemiology 25 
(1996), 554-559. 

73. P. Humphreys and D. Freedman, The grand leap, Br. J, Phi. Sci. 47 (1996), 113-123. 

74. J. Pearl, Causai diagrams for empirical research, Biometrika. 82 (1995), 669—710 (with 
discussion). 

75. N. Munoz, F. X. Bosch, K. V. Shah, A. Meheus, (eds.) “The Epidemiology of Human 
Papillomavirus and Cervical Cancer” International Agency for Research on Cancer, 
Lyon. Distributed in the U.S.A. by Oxford University Press, 1992. 




K'-l 

. 1 . 


Advances in Applied Mathematics will 
mathematics. Particular regard will be giv 
advance in their field, and which are stylis 
papers be preceded by an introduction o- 
relevance of the results presented. Well 
published. 

Submission of Manuscripts. Clai 
liness of the contents are the prime cri 
publication. Original papers only will be i 
review with the understanding that the sai 
is presently submitted elsewhere, and tha 
approved by all of the authors and by th 
out; further, that any person cited as a 
approved such citation. Written author! 
discretion. Articles and any other mat 
Mathematics represent the opinions of the 
reflect the opinions of the Editor(s) and tl 
Papers may be submitted to the Edit 
Institute of Technology, Department of 
02139, or to any member of the Editorial 
Authors submitting a manuscript do so 
for publication, copyright in the article, ii 
in all forms and media, shall be assigned i 
will not refuse any reasonable request by t 
of his or her contributions to the journal. 

Manuscripts should he prepared accordbt 
these rules causes publication delays)'. 

Form of Manuscript. Submit ma 
nal typewritten copy (preferably triple-sp: 
side of 8.5 X 11 inch white paper. Num 
article title, author and coauthor narm 
institution, city, state, and zip code). At tl 
the title (indicated by superscript *). 
head (abbreviated form of the title) of les 
the name and mailing address of the auth 
Abstract. The inclusion of an abst 
abstract, it should be typed on page 3 anc 
Li.st of Symbols. Attach to the i 
symbols, identified typographically, not i 
print but is essential in order to avoid 
equations are handwritten in the text 
handwritten.) Distinguish between "oh 
upper- and lowercase “kay”; etc. Indl 
(German, Greek, vector, scalar, script, el 


PM3006509584 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 


















